- Data
- Conformer model trained using Self-Remixing
(TF-masking)
- Conformer model trained using RemixIT
(TF-masking)
- TF-GridNet model trained using Self-Remixing
(Complex Spectral Mapping)
- TF-GridNet model trained using RemixIT
(Complex Spectral Mapping)
NOTE: Output order of the three separated signals are aligned with the ground-truth audios (two speech and a noise).
Input mixture and clean speeches
Mixture
|
Clean speech1
|
Clean speech2
|
Conformer Self-Remixing
Epoch 0 Output 1 (speech1) |
Epoch 0 Output 2 (speech2) |
Epoch 0 Output 3 (noise) |
Epoch 5 Output 1 (speech1) |
Epoch 5 Output 2 (speech2) |
Epoch 5 Output 3 (noise) |
Epoch 10 Output 1 (speech1) |
Epoch 10 Output 2 (speech2) |
Epoch 10 Output 3 (noise) |
Epoch 15 Output 1 (speech1) |
Epoch 15 Output 2 (speech2) |
Epoch 15 Output 3 (noise) |
Conformer RemixIT
Epoch 0 Output 1 (speech1) |
Epoch 0 Output 2 (speech2) |
Epoch 0 Output 3 (noise) |
Epoch 5 Output 1 (speech1) |
Epoch 5 Output 2 (speech2) |
Epoch 5 Output 3 (noise) |
Epoch 10 Output 1 (speech1) |
Epoch 10 Output 2 (speech2) |
Epoch 10 Output 3 (noise) |
Epoch 15 Output 1 (speech1) |
Epoch 15 Output 2 (speech2) |
Epoch 15 Output 3 (noise) |
TF-GridNet Self-Remixing
Epoch 0 Output 1 (speech1) |
Epoch 0 Output 2 (speech2) |
Epoch 0 Output 3 (noise) |
Epoch 5 Output 1 (speech1) |
Epoch 5 Output 2 (speech2) |
Epoch 5 Output 3 (noise) |
Epoch 10 Output 1 (speech1) |
Epoch 10 Output 2 (speech2) |
Epoch 10 Output 3 (noise) |
Epoch 15 Output 1 (speech1) |
Epoch 15 Output 2 (speech2) |
Epoch 15 Output 3 (noise) |
TFGridNet RemixIT
Epoch 0 Output 1 (speech1) |
Epoch 0 Output 2 (speech2) |
Epoch 0 Output 3 (noise) |
Epoch 5 Output 1 (speech1) |
Epoch 5 Output 2 (speech2) |
Epoch 5 Output 3 (noise) |
Epoch 10 Output 1 (speech1) |
Epoch 10 Output 2 (speech2) |
Epoch 10 Output 3 (noise) |
Epoch 15 Output 1 (speech1) |
Epoch 15 Output 2 (speech2) |
Epoch 15 Output 3 (noise) |