About Me

Kohei Saijo / 西城耕平

2nd year Ph.D student in Media Intelligence Laboratory at Waseda University in Japan. Working on speech enhancement and source separation.

Research Interest

Unsupervised source separation
General sound source separation
Speech enhancement

Google Scholar | GitHub | X | CV

Experiances

Oct. 2024 - Present
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

Research assistant
Mentor: Yoshiaki Bando

Nov. 2023 - Aug. 2024
Mitsubishi Electric Research Laboratories, MA, USA

Research internship
Worked on
- unsupervised speech separation (Interspeech 2024),
- state-of-the-art speech separation model (IWAENC 2024),
- text-queried target sound extraction (ICASSP 2025), and
- unified source separation (ICASSP 2025)
Mentor: Jonathan Le Roux

Apr. 2023 - July 2023
Carnegie Mellon University, Pittsburgh, PA, USA

Visiting scholar
Worked on multi-task universal speech enhancement (ASRU 2023)
Mentor: Shinji Watanabe

Sep. 2021 - Apr. 2022
LINE Corporation, Tokyo, Japan

Part-time researcher
Worked on unsupervised multi-channel source separation (Interspeech 2022)
Mentor: Robin Scheibler

Aug. 2021 - Sep. 2021
LINE Corporation, Tokyo, Japan

Research internship
Worked on multi-channel joint source separation and dereverberation (Interspeech 2022)
Mentor: Robin Scheibler

Publications

Preprint

Kohei Saijo, Yoshiaki Bando, “Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation.” [arXiv]

International conference (peer-reviewed, first author)

Kohei Saijo, Tetsuji Ogawa, “A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models,” 2025 33rd European Signal Processing Conference (EUSIPCO), September 2025 (to appear) [arXiv].

Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, and Shinji Watanabe, “Interspeech 2025 URGENT Speech Enhancement Challenge,” 2025 26th Annual Conference of International Speech Communication Association (INTERSPEECH), August 2025 (to appear) [arXiv] [Demo page] [Challenge webpage].

Kohei Saijo, Janek Ebbers, François G Germain, Gordon Wichern, and Jonathan Le Roux, “Task-Aware Unified Source Separation,” 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2025. [arXiv] [IEEE Xplore] [Code] [Demo page]

Kohei Saijo, Janek Ebbers, François G Germain, Sameer Khurana, Gordon Wichern, and Jonathan Le Roux, “Leveraging Audio-Only Data for Text-Queried Target Sound Extraction,” 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [arXiv] [IEEE Xplore]

Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, and Jonathan Le Roux, “TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement,” International Workshop on Acoustic Signal Enhancement (IWAENC), Sept. 2024. [IEEE Xplore] [arXiv] [Code] Best Student Paper Award Finalist

Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, and Jonathan Le Roux, “Enhanced Reverberation as Supervision for Unsupervised Speech Separation,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024. [arXiv] [ISCA archive] [Code]

Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa, “A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction,” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2023. [arXiv] [IEEE Xplore] [Code] [Demo page]

Kohei Saijo, Tetsuji Ogawa, “Remixing-based Unsupervised Source Separation from Scratch,” 2023 24th Annual Conference of International Speech Communication Association (INTERSPEECH), August 2023. [arXiv] [ISCA archive] [Code]

Kohei Saijo, Tetsuji Ogawa, “Self-Remixing: Unsupervised Speech Separation via Separation and Remixing,” 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2023. [arXiv] [IEEE Xplore] [Code]

Kohei Saijo, Tetsuji Ogawa, “Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [ISCA archive]

Kohei Saijo, Robin Scheibler, “Spatial Loss for Unsupervised Multi-channel Source Separation,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [arXiv] [ISCA archive]

Kohei Saijo, Robin Scheibler, “Independence-based Joint Dereverberation and Separation with Neural Source Model,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [arXiv] [ISCA archive]

Kohei Saijo, Tetsuji Ogawa, “Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation,” 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022. [arXiv] [IEEE Xplore]

Kohei Saijo, Kazuhiro Katagiri, Masaru Fujieda, Tetsunori Kobayashi, Tetsuji Ogawa, “Comparative Study on DNN-based Minimum Variance Beamforming Robust to Small Movements of Sound Sources,” 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), December 2021. [IEEE Xplore]

International conference (peer-reviewed, co-author)

Wangyou Zhang, Kohei Saijo, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian, “Lessons Learned from the URGENT 2024 Speech Enhancement Challenge,” 2025 26th Annual Conference of International Speech Communication Association (INTERSPEECH), August 2025 (to appear).

Tomohiro Hayashi, Riku Ogino, Kohei Saijo and Tetsuji Ogawa, “What to Refer and How? - Exploring Handling of Auxiliary Information in Target Speaker Extraction,” 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), December 2024.

Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo and Jonathan Le Roux, “PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.

Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian, “URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.

Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian, “Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.

Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov,Kohei Saijo, and Hsiu-Hsuan Wang “Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2024.

Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian, “Toward Universal Speech Enhancement For Diverse Input Conditions,” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2023.

Riku Ogino, Kohei Saijo, Tetsuji Ogawa, “Design of Discriminators in GAN-Based Unsupervised Learning of Neural Post-Processors for Suppressing Localized Spectral Distortion,” 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), November 2022.

Domestic conference / workshop (in Japanese)

西城耕平，小川哲司， ”音源の分離と再混合による事前学習を必要としないモノラル教師なし音源分離，” 日本音響学会研究発表会講演論文集(ASJ)，September 2023．

西城耕平，小川哲司， ”Self-Remixing: 音源の分離と再混合による教師なし音源分離，” 日本音響学会研究発表会講演論文集(ASJ)，March 2023．

西城耕平，小川哲司， ”ブラインド音源分離を教師としたTeacher-Student学習とUnmix-Remix無矛盾学習によるSequential Neural Beamformerの教師なし学習，” 日本音響学会研究発表会講演論文集(ASJ)，September 2022．

西城耕平，小川哲司， ”ブラインド音源分離の分離音と観測信号を教師信号として用いたSequential Neural Beamformerの教師なし学習，” 電子情報通信学会技術研究報告(SP)，June 2022．

西城耕平，小川哲司， ”敵対的学習と Unmix-Remix 無矛盾学習による教師なし音源分離，” 日本音響学会研究発表会講演論文集(ASJ)，March 2022．

西城耕平，藤枝大，片桐一浩，小林哲則，小川哲司， ”DNNを用いた最小分散ビームフォーマの音源の動きに対する頑健性：音源追跡とエリア収音に基づくアプローチの比較，” 日本音響学会研究発表会講演論文集(ASJ)，September 2021．

西城耕平，藤枝大，片桐一浩，小林哲則，小川哲司， ”空間フィルタ出力を補助情報として用いた音源の移動に頑健なニューラル音声強調，” 日本音響学会研究発表会講演論文集(ASJ)，March 2021．

Awards

December 2023
ISS Young Researcher’s Award in Speech Field
from the Institute of Electronics, Information and Communication Engineers (IEICE)

March 2022
Best Student Presentation Award
from the Acoustical Society of Japan (ASJ)

Grants

April 2024 - March 2026 Research Fellowship for Young Scientists (DC2) from Japan Society for the Promotion of Science (JSPS)

April 2023 - March 2024
Support for Pioneering Research Initiated by the Next Generation (SPRING) from Japan Science and Technology Agency (JST)

April 2023 - July 2023
Super Global University from ICT & Robotics, Waseda University

April 2021 - March 2023
Repayment Exemption for Graduate Students with Excellent Achievements (Type I; full-exemption) from Japan Student Services Organization (JASSO)

Contact

Address

Media Intelligence Lab.
Room 40-701 27 Waseda-machi Shinjuku-ku, Tokyo 162-0042, Japan

E-mail

saijo[at]pcl.cs.waseda.ac.jp