    I have worked on Machine Learning in Audio and Multimodal dataset

Hi my name is Jùnchéng (Billy) Lì. 励骏成 I came back from the industry to finish my PhD study at CMU at fall 2019. I have spent 6 years on topics related to deep learning. I have worked on Deep Learning's applications in audio and multi-modal data. Recently, I have worked on understanding deep learning's vulnerabilities and robustness. I have always been fascinated by the magical effects of deep neural networks, meanwhile, the unexplainable behaviors of the neural nets kept haunting me. In my research careers, I have been influenced by various beliefs:
"Deep learning is the dark side, and convex optimization is the true justice."
"We have been wasting so much time building symbolic knowledge-based systems, and history proved that Alpha-zero and BERT are the only real things that worked."
"Human knowledge is the corner stone of AI, the only valid path to building AI is by teaching machines to think like humans." .......
It is very tempting for young research Jedis to fall believing in any of these "dogmas" since they are all very seductive to a certain group of people with specific background.
However, I believe there's a fine balance, a bridge that goes between all the communities: the ML community, the theory community, the NLP commmunity and the speech community... My goal is to build part of that bridge between the gap between theory and application during my PhD.
I am convinced that good research is not necessarily impactful, but impactful researches are usually dependent on excellent taste of topic, significant effort, bullet-proof writing, and necessary PR.
As I grow more experienced, I also think research itself greatly resembles value investing. Not only do we need to diversify our portfolio, we also need to put enough concentration in topics with growth. We don't have unlimited time and resources to spend, but we need to be patient and confident about whatever we chose to invest in. This process requires tremendous tenacity and a stable mindset to be able to stomach the up and downs. And never be arrogant, or you will get smashed right away!

Robust Deep Learning

Multimodal Machine Learning

Audio/Speech Processing

Natural Language Processing

Adversarial Music

Adversarial Camera Sticker

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling

This work was collaboration with Yun Wang (Maigo), and it later became part of his thesis work. Check out the resources below:

Revisiting Disentanglement in VAE

Carnegie Mellon University
School of Computer Science
Language Technology Institute


Tongji University



Work Experience

Research Intern 2014-2014

Pittsburgh Port Authority

Build website and manage database to visualize transportation data of Pittsburgh city( Recent 2 years), and analyze the data to provide optimization solutions to improve the current resource allocation.

Recent Blog

Sept 1, 2020 | ML Blog

All of VAE

Everything I know about disentanglement in VAE

April 14, 2018 | Speech

Theory Basic NoteBook

April 14, 2018 | Inspiration

Application Notebook

2020 was a rough year!

ISCSLP 2021 Tutorial

Tutorial on Robust Audio

Academic Paper Review


NeurIPS 2019 (Vancouver, BC)

This piece of music could stop Amazon Alexa from working --NewScientist

ICML 2019 (Long Beach, CA)

Video Presented at ICML 2019 about the adversarial camera sticker

ICASSP 2019 (Brighton, UK)

Slides Presented "A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling"

ICMR 2018 (Yokohama, Japan) Best Paper Award

Our paper won the Best Paper Award

ICASSP 2017 (New Orleans)

Presented two pieces of work: Environment Sound Classification and VGG for Sound

Office 6605, Gates Hillman Center, 5000 Forbes Ave, Pittsburgh, PA 15217