Analysis of RemixIT and Self-Remixing when training from scratch

Data
Conformer model trained using Self-Remixing (TF-masking)
Conformer model trained using RemixIT (TF-masking)
TF-GridNet model trained using Self-Remixing (Complex Spectral Mapping)
TF-GridNet model trained using RemixIT (Complex Spectral Mapping)

NOTE: Output order of the three separated signals are aligned with the ground-truth audios (two speech and a noise).

Input mixture and clean speeches

Mixture

Clean speech1

Clean speech2

Conformer Self-Remixing

Epoch 0 Output 1 (speech1)	Epoch 0 Output 2 (speech2)	Epoch 0 Output 3 (noise)
Epoch 5 Output 1 (speech1)	Epoch 5 Output 2 (speech2)	Epoch 5 Output 3 (noise)
Epoch 10 Output 1 (speech1)	Epoch 10 Output 2 (speech2)	Epoch 10 Output 3 (noise)
Epoch 15 Output 1 (speech1)	Epoch 15 Output 2 (speech2)	Epoch 15 Output 3 (noise)

Conformer RemixIT

Epoch 0 Output 1 (speech1)	Epoch 0 Output 2 (speech2)	Epoch 0 Output 3 (noise)
Epoch 5 Output 1 (speech1)	Epoch 5 Output 2 (speech2)	Epoch 5 Output 3 (noise)
Epoch 10 Output 1 (speech1)	Epoch 10 Output 2 (speech2)	Epoch 10 Output 3 (noise)
Epoch 15 Output 1 (speech1)	Epoch 15 Output 2 (speech2)	Epoch 15 Output 3 (noise)

TF-GridNet Self-Remixing

Epoch 0 Output 1 (speech1)	Epoch 0 Output 2 (speech2)	Epoch 0 Output 3 (noise)
Epoch 5 Output 1 (speech1)	Epoch 5 Output 2 (speech2)	Epoch 5 Output 3 (noise)
Epoch 10 Output 1 (speech1)	Epoch 10 Output 2 (speech2)	Epoch 10 Output 3 (noise)
Epoch 15 Output 1 (speech1)	Epoch 15 Output 2 (speech2)	Epoch 15 Output 3 (noise)

TFGridNet RemixIT

Epoch 0 Output 1 (speech1)	Epoch 0 Output 2 (speech2)	Epoch 0 Output 3 (noise)
Epoch 5 Output 1 (speech1)	Epoch 5 Output 2 (speech2)	Epoch 5 Output 3 (noise)
Epoch 10 Output 1 (speech1)	Epoch 10 Output 2 (speech2)	Epoch 10 Output 3 (noise)
Epoch 15 Output 1 (speech1)	Epoch 15 Output 2 (speech2)	Epoch 15 Output 3 (noise)