NOTE: Output order of the three separated signals are aligned with the ground-truth audios (two speech and a noise).
1-source mixture
Mixture |
Clean source1 (= Mixture)
|
MixIT
Output 1
|
MixIT + Sparsity loss
Output 1
|
Self-Remixing (+ CBS) from scratch
Output 1
|
Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model
Output 1
|
2-source mixture
Mixture |
Clean source 1
|
Clean source 2
|
MixIT
Output 1
|
Output 2
|
MixIT + Sparsity loss
Output 1
|
Output 2
|
Self-Remixing (+ CBS) from scratch
Output 1
|
Output 2
|
Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model
Output 1
|
Output 2
|
3-source mixture
Mixture |
Clean source 1
|
Clean source 2
| Clean source 3
|
MixIT
Output 1
|
Output 2
|
Output 3
|
MixIT + Sparsity loss
Output 1
|
Output 2
|
Output 3
|
Self-Remixing (+ CBS) from scratch
Output 1
|
Output 2
|
Output 3
|
Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model
Output 1
|
Output 2
|
Output 3
|
4-source mixture
Mixture |
Clean source 1
|
Clean source 2
| Clean source 3
|
Clean source 4
|
MixIT
Output 1
|
Output 2
|
Output 3
|
Output 3
|
MixIT + Sparsity loss
Output 1
|
Output 2
|
Output 3
|
Output 4
|
Self-Remixing (+ CBS) from scratch
Output 1
|
Output 2
|
Output 3
|
Output 4
|
Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model
Output 1
|
Output 2
|
Output 3
|
Output 4
|