- Data
- Conformer model trained using Self-Remixing
(TF-masking)
- Conformer model trained using RemixIT
(TF-masking)
- TF-GridNet model trained using Self-Remixing
(Complex Spectral Mapping)
- TF-GridNet model trained using RemixIT
(Complex Spectral Mapping)
NOTE: Output order of the three separated signals are aligned with the ground-truth audios (two speech and a noise).
Input mixture and clean speeches
Mixture ![]() |
Clean speech1 ![]() |
Clean speech2 ![]() |
Conformer Self-Remixing
Epoch 0 Output 1 (speech1) ![]() |
Epoch 0 Output 2 (speech2) ![]() |
Epoch 0 Output 3 (noise) ![]() |
Epoch 5 Output 1 (speech1) ![]() |
Epoch 5 Output 2 (speech2) ![]() |
Epoch 5 Output 3 (noise) ![]() |
Epoch 10 Output 1 (speech1) ![]() |
Epoch 10 Output 2 (speech2) ![]() |
Epoch 10 Output 3 (noise) ![]() |
Epoch 15 Output 1 (speech1) ![]() |
Epoch 15 Output 2 (speech2) ![]() |
Epoch 15 Output 3 (noise) ![]() |
Conformer RemixIT
Epoch 0 Output 1 (speech1) ![]() |
Epoch 0 Output 2 (speech2) ![]() |
Epoch 0 Output 3 (noise) ![]() |
Epoch 5 Output 1 (speech1) ![]() |
Epoch 5 Output 2 (speech2) ![]() |
Epoch 5 Output 3 (noise) ![]() |
Epoch 10 Output 1 (speech1) ![]() |
Epoch 10 Output 2 (speech2) ![]() |
Epoch 10 Output 3 (noise) ![]() |
Epoch 15 Output 1 (speech1) ![]() |
Epoch 15 Output 2 (speech2) ![]() |
Epoch 15 Output 3 (noise) ![]() |
TF-GridNet Self-Remixing
Epoch 0 Output 1 (speech1) ![]() |
Epoch 0 Output 2 (speech2) ![]() |
Epoch 0 Output 3 (noise) ![]() |
Epoch 5 Output 1 (speech1) ![]() |
Epoch 5 Output 2 (speech2) ![]() |
Epoch 5 Output 3 (noise) ![]() |
Epoch 10 Output 1 (speech1) ![]() |
Epoch 10 Output 2 (speech2) ![]() |
Epoch 10 Output 3 (noise) ![]() |
Epoch 15 Output 1 (speech1) ![]() |
Epoch 15 Output 2 (speech2) ![]() |
Epoch 15 Output 3 (noise) ![]() |
TFGridNet RemixIT
Epoch 0 Output 1 (speech1) ![]() |
Epoch 0 Output 2 (speech2) ![]() |
Epoch 0 Output 3 (noise) ![]() |
Epoch 5 Output 1 (speech1) ![]() |
Epoch 5 Output 2 (speech2) ![]() |
Epoch 5 Output 3 (noise) ![]() |
Epoch 10 Output 1 (speech1) ![]() |
Epoch 10 Output 2 (speech2) ![]() |
Epoch 10 Output 3 (noise) ![]() |
Epoch 15 Output 1 (speech1) ![]() |
Epoch 15 Output 2 (speech2) ![]() |
Epoch 15 Output 3 (noise) ![]() |