Audio examples from FUSS dataset

NOTE: Output order of the three separated signals are aligned with the ground-truth audios (two speech and a noise).

1-source mixture

Mixture

Clean source1 (= Mixture)

MixIT

Output 1

MixIT + Sparsity loss

Output 1

Self-Remixing (+ CBS) from scratch

Output 1

Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model

Output 1

2-source mixture

Mixture

Clean source 1

Clean source 2

MixIT

Output 1

Output 2

MixIT + Sparsity loss

Output 1

Output 2

Self-Remixing (+ CBS) from scratch

Output 1

Output 2

Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model

Output 1

Output 2

3-source mixture

Mixture

Clean source 1

Clean source 2

Clean source 3

MixIT

Output 1

Output 2

Output 3

MixIT + Sparsity loss

Output 1

Output 2

Output 3

Self-Remixing (+ CBS) from scratch

Output 1

Output 2

Output 3

Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model

Output 1

Output 2

Output 3

4-source mixture

Mixture

Clean source 1

Clean source 2

Clean source 3

Clean source 4

MixIT

Output 1

Output 2

Output 3

Output 3

MixIT + Sparsity loss

Output 1

Output 2

Output 3

Output 4

Self-Remixing (+ CBS) from scratch

Output 1

Output 2

Output 3

Output 4

Self-Remixing (+ CBS), fine-tuning MixIT pre-trained model

Output 1

Output 2

Output 3

Output 4