About Me
Kohei Saijo / 西城 耕平
2nd year Ph.D student in Media Intelligence Laboratory at Waseda University in Japan. Working on speech enhancement and source separation.
Research Interest
- Unsupervised source separation
- Universal speech enhancement
- Multi-channel source separation
Google Scholar | GitHub | Twitter | CV
Experiances
Oct. 2024 - Present
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Research assistant
- Mentor: Yoshiaki Bando
Nov. 2023 - Aug. 2024
Mitsubishi Electric Research Laboratories, MA, USA
- Research internship
- Worked on unsupervised speech separation, state-of-the-art speech separation model, text-queried target sound extraction, and unified source separation.
- Mentor: Jonathan Le Roux
Apr. 2023 - July 2023
Carnegie Mellon University, Pittsburgh, PA, USA
- Visiting scholar
- Worked on multi-task universal speech enhancement
- Mentor: Shinji Watanabe
Sep. 2021 - Apr. 2022
LINE Corporation, Tokyo, Japan
- Part-time researcher
- Worked on unsupervised multi-channel source separation
- Mentor: Robin Scheibler
Aug. 2021 - Sep. 2021
LINE Corporation, Tokyo, Japan
- Research internship
- Worked on multi-channel joint source separation and dereverberation
- Mentor: Robin Scheibler
Publications
Preprint
Kohei Saijo, Janek Ebbers, François G Germain, Gordon Wichern, and Jonathan Le Roux, “Task-Aware Unified Source Separation,” arXiv preprint, 2024. [arXiv]
Kohei Saijo, Janek Ebbers, François G Germain, Gordon Wichern, and Jonathan Le Roux, “Leveraging Audio-Only Data for Text-Queried Target Sound Extraction,” arXiv preprint, 2024. [arXiv]
International conference (peer-reviewed, first author)
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, and Jonathan Le Roux, “TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement,” International Workshop on Acoustic Signal Enhancement (IWAENC), Sept. 2024. [IEEE Xplore] [arXiv] [Code]
Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, and Jonathan Le Roux, “Enhanced Reverberation as Supervision for Unsupervised Speech Separation,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024. [arXiv] [ISCA archive] [Code]
Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa, “A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction,” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2023. [arXiv] [IEEE Xplore] [Code] [Demo page]
Kohei Saijo, Tetsuji Ogawa, “Remixing-based Unsupervised Source Separation from Scratch,” 2023 24th Annual Conference of International Speech Communication Association (INTERSPEECH), August 2023. [arXiv] [ISCA archive] [Code]
Kohei Saijo, Tetsuji Ogawa, “Self-Remixing: Unsupervised Speech Separation via Separation and Remixing,” 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2023. [arXiv] [IEEE Xplore] [Code]
Kohei Saijo, Tetsuji Ogawa, “Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [ISCA archive]
Kohei Saijo, Robin Scheibler, “Spatial Loss for Unsupervised Multi-channel Source Separation,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [arXiv] [ISCA archive]
Kohei Saijo, Robin Scheibler, “Independence-based Joint Dereverberation and Separation with Neural Source Model,” 2022 23rd Annual Conference of International Speech Communication Association (INTERSPEECH), September 2022. [arXiv] [ISCA archive]
Kohei Saijo, Tetsuji Ogawa, “Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation,” 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022. [arXiv] [IEEE Xplore]
Kohei Saijo, Kazuhiro Katagiri, Masaru Fujieda, Tetsunori Kobayashi, Tetsuji Ogawa, “Comparative Study on DNN-based Minimum Variance Beamforming Robust to Small Movements of Sound Sources,” 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), December 2021. [IEEE Xplore]
International conference (peer-reviewed, co-author)
Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo and Jonathan Le Roux, “PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.
Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian, “URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.
Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian, “Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement,” 2024 25th Annual Conference of International Speech Communication Association (INTERSPEECH), Sept. 2024.
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov,Kohei Saijo, and Hsiu-Hsuan Wang “Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2024.
Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian, “Toward Universal Speech Enhancement For Diverse Input Conditions,” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2023.
Riku Ogino, Kohei Saijo, Tetsuji Ogawa, “Design of Discriminators in GAN-Based Unsupervised Learning of Neural Post-Processors for Suppressing Localized Spectral Distortion,” 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), November 2022.
Domestic conference / workshop (in Japanese)
西城耕平,小川哲司, ”音源の分離と再混合による事前学習を必要としないモノラル教師なし音源分離,” 日本音響学会研究発表会講演論文集(ASJ),September 2023.
西城耕平,小川哲司, ”Self-Remixing: 音源の分離と再混合による教師なし音源分離,” 日本音響学会研究発表会講演論文集(ASJ),March 2023.
西城耕平,小川哲司, ”ブラインド音源分離を教師としたTeacher-Student学習とUnmix-Remix無矛盾学習によるSequential Neural Beamformerの教師なし学習,” 日本音響学会研究発表会講演論文集(ASJ),September 2022.
西城耕平,小川哲司, ”ブラインド音源分離の分離音と観測信号を教師信号として用いたSequential Neural Beamformerの教師なし学習,” 電子情報通信学会技術研究報告(SP),June 2022.
西城耕平,小川哲司, ”敵対的学習と Unmix-Remix 無矛盾学習による教師なし音源分離,” 日本音響学会研究発表会講演論文集(ASJ),March 2022.
西城耕平,藤枝大,片桐一浩,小林哲則,小川哲司, ”DNNを用いた最小分散ビームフォーマの音源の動きに対する頑健性:音源追跡とエリア収音に基づくアプローチの比較,” 日本音響学会研究発表会講演論文集(ASJ),September 2021.
西城耕平,藤枝大,片桐一浩,小林哲則,小川哲司, ”空間フィルタ出力を補助情報として用いた音源の移動に頑健なニューラル音声強調,” 日本音響学会研究発表会講演論文集(ASJ),March 2021.
Awards
December 2023
ISS Young Researcher’s Award in Speech Field
from the Institute of Electronics, Information and Communication Engineers (IEICE)
March 2022
Best Student Presentation Award
from the Acoustical Society of Japan (ASJ)
Grants
April 2024 - March 2026 Research Fellowship for Young Scientists (DC2) from Japan Society for the Promotion of Science (JSPS)
April 2023 - March 2024
Support for Pioneering Research Initiated by the Next Generation (SPRING)
from Japan Science and Technology Agency (JST)
April 2023 - July 2023
Super Global University
from ICT & Robotics, Waseda University
April 2021 - March 2023
Repayment Exemption for Graduate Students with Excellent Achievements (Type I; full-exemption)
from Japan Student Services Organization (JASSO)
Contact
Address
Media Intelligence Lab.
Room 40-701 27 Waseda-machi
Shinjuku-ku, Tokyo 162-0042, Japan
saijo[at]pcl.cs.waseda.ac.jp