Tao Qin (秦涛)
Partner Research Manager at Microsoft Research
Leading the MSR AI for Science Asia team
📧 Microsoft Research Profile
🔬 Research Interests
- AI for Science: Science Foundation Models, Molecular Modeling/Drug Discovery, Biochemistry, Material Design
- Deep Learning: LLMs, Machine Translation, Healthcare, Speech Synthesis/Recognition, Music Understanding/Composition
- Reinforcement Learning: RL for Science, Games, and Real-world Applications
🚀 Recent Updates
- Focus on AI for science
- Action Editor for Transactions on Machine Learning Research (TMLR)
- BioGPT: GPT model for biomedical domain [Paper] [Code/Model]
- MPNet generates most effective sentence embeddings among ~40 pretrained models
- TD3 with Reverse KL Regularizer selected as ICDM 2022 Best Student Paper Award runner-up
- Accelerating protein engineering with fitness landscape modeling and RL (bioRxiv 2023)
- Impact of Large Language Models on Scientific Discovery (Arxiv 2023)
🏆 Featured Work
🀄 Suphx Mahjong AI
First 10 DAN AI for Mahjong based on deep reinforcement learning with novel techniques:
🗣️ FastSpeech series
Novel feed-forward networks for parallel mel-spectrogram generation:
- 270x faster mel-spectrogram generation
- 38x faster end-to-end speech synthesis
Integrated into Azure TTS with 70+ languages/locales and 200+ voices
[News 1] [News 2]
🌐 Machine Translation Achievements
- Achieved human parity in Chinese-English translation (2018)
- Won 1st place in 8 translation tasks at WMT 2019
- Dual learning integrated into Microsoft Translator for multiple languages
- MASS algorithm - first pre-training model for sequence-to-sequence generation [Code] [News]
📚 Books & Surveys
Books
- Dual Learning (Springer 2020)
Framework leveraging structural duality between AI tasks. Covers dual reconstruction, joint-probability equation, and applications in machine translation, image-to-image translation, speech processing, etc.
Surveys
- A Survey on Neural Speech Synthesis (Arxiv 2021)
- A Survey on Low-Resource Neural Machine Translation (IJCAI 2021)
- Generalizing to Unseen Domains: A Survey on Domain Generalization (IJCAI 2021)
🧪 Foundation Models
- BioGPT: Generative Pre-trained Transformer for biomedical text [Paper] [Code]
- SPRoBERTa: Protein Embedding Learning with Local Fragment Modeling [Paper]
- Transcormer: Transformer for Sentence Scoring (NeurIPS 2022)
- NAS-BERT: Task-Agnostic BERT Compression (KDD 2021)
- MusicBERT: Symbolic Music Understanding (ACL 2021)
- MPNet: Masked and Permuted Pre-training (NeurIPS 2020) [Code]
📊 Open Source & Datasets
- R-Drop: Regularized Dropout for Neural Networks [Code] (NeurIPS 2021)
- Fully Parameterized Quantile Function for RL [Code] (NeurIPS 2019)
- Incorporating BERT into NMT [Code] (ICLR 2020)
- Efficient Training of BERT [Code] (ICML 2019)
- LightRNN: Memory/Computation-Efficient RNNs [Code]
- Microsoft Learning to Rank Datasets [Dataset]
📰 Blogs & Articles
🎓 Education & Affiliations
- PhD & Bachelor: Tsinghua University
- Adjunct Professor (PhD advisor): University of Science and Technology of China
- Senior Member: ACM & IEEE
📝 Publications
2023
- FABind: Fast Accurate Protein-Ligand Binding (NeurIPS)
- SMT-DTA: Drug-Target Affinity Prediction (Briefings in Bioinformatics)
- Pre-training Antibody Language Models (KDD)
- Dual-view Molecular Pre-training (KDD)
- Retrosynthetic Planning (ICML)
- De Novo Molecular Generation (ICLR)
- O-GNN: Incorporating Ring Priors (ICLR)
2022
- BioGPT (Briefings in Bioinformatics) [Code]
- TD3 with Reverse KL (ICDM Best Student Paper Runner-up)
- SPRoBERTa (Briefings in Bioinformatics)
- Unified 2D/3D Molecular Pre-training (KDD)
- Direct Molecular Conformation Generation (TMLR)
2021 & Earlier
- Making Better Decisions in Continuous Control (ICLR 2023)
- Tiered RL (NeurIPS 2022)
- Museformer for Music Generation (NeurIPS 2022)
- Suphx Mahjong AI (arXiv 2020)
- Deliberation Networks (NIPS 2017)
- Dual Learning (NIPS 2016)
Full Publication List
💬 Talks & Tutorials
- Tutorial: Recent Advances in Neural Speech Synthesis (ICASSP 2022)
- Tutorial: Neural Speech Synthesis (IJCAI 2021)
- Tutorial: Dual Learning (IJCAI 2019, ACML 2018)
- Keynote: Neural Machine Translation (ACML 2018)
- Workshop: Multi-output Learning
- Efficient NMT (GTC China 2018)
🎵 Music Understanding & Generation
- MusicBERT: Symbolic Music Understanding
- PDAugment: Automatic Lyrics Transcription
- SongMASS: Automatic Song Writing
- DeepRapper: Rap Generation
- TeleMelody: Melody Generation
- PopMAG: Music Accompaniment
- HiFiSinger: High-Fidelity Singing Synthesis
MSR AI for Science Asia Team
Microsoft Research Asia
📧 taoqin@microsoft.com
Team Website
“We are developing AI to accelerate scientific discovery - from molecular design to protein engineering and drug discovery.”