Jiaqi Ma | 马家祺

Assistant Professor
School of Information Sciences
Siebel School of Computing and Data Science (Affiliation)
University of Illinois Urbana-Champaign
Contact:
Outlook: jiaqima AT illinois DOT edu
Gmail: jiaqima.mle AT gmail DOT com
[Google Scholar | GitHub | Bluesky | Twitter | YouTube]
About Me
I am an Assistant Professor at the University of Illinois Urbana-Champaign (UIUC). Prior to UIUC, I was a Postdoctoral Researcher at Harvard University. I received my Ph.D. from University of Michigan and a B.Eng. from Tsinghua University.
I’m interested in the broad area of machine learning and artificial intelligence (AI). Recently, my research focuses on the data foundations of AI, including three complementary aspects: 1) understanding how training data impact AI models (data attribution); 2) developing data-centric algorithms that improve the quality and safety of training data (data curation and synthetic data generation); and 3) studying how data mediate the societal impact of AI (data compensation and machine unlearning).
For students who want to work with me, please see here for more details.
News
Please check out our new survey paper on data attribution!
Five papers accepted by NeurIPS 2025 (with one oral)!
I’m serving as an Area Chair for AISTATS 2026!
Giving a talk about our work on data attribution at the AI Seminar at UMich in May 2025!
Giving a talk about our work on data attribution at YouTube in Apr 2025!
Giving a talk at the New Faculty Highlight session at AAAI 2025!
I’m serving as an Area Chair for ICML 2025!
Giving a talk about our work on data attribution at RIKEN AIP in Tokyo in Jan 2025!
Giving a talk about our work on data attribution at the VASC Seminar at CMU in Dec 2024!
Giving a talk about our work on data attribution at the IDEAL Institute in Chicago in Nov 2024!
Two papers accepted by NeurIPS 2024 (with one spotlight at the D&B track)!
I’m serving as an Area Chair for AISTATS 2025!
Recent Papers
Please see the full list on my Google Scholar page.
Preprints
- A Survey of Data Attribution: Methods, Applications, and Evaluation in the Era of Generative AI.
Junwei Deng*, Yuzheng Hu*, Pingbang Hu*, Ting-Wei Li*, Shixuan Liu*, Jiachen T. Wang, Dan Ley, Qirun Dai, Benhao Huang, Jin Huang, Cathy Jiao, Hoang Anh Just, Yijun Pan, Jingyan Shen, Yiwen Tu, Weiyi Wang, Xinhe Wang, Shichang Zhang, Shiyuan Zhang, Ruoxi Jia, Himabindu Lakkaraju, Hao Peng, Weijing Tang, Chenyan Xiong, Jieyu Zhao, Hanghang Tong, Han Zhao, Jiaqi Ma.
[Preprint] - Accountability Attribution: Tracing Model Behavior to Training Processes.
Shichang Zhang, Hongzhe Du, Jiaqi Ma, Himabindu Lakkaraju.
[ArXiv] - Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution.
Yiwen Tu*, Ziqi Liu*, Jiaqi Ma, Weijing Tang.
[ArXiv] - Efficient Ensembles Improve Training Data Attribution.
Junwei Deng*, Ting Wei Li*, Shichang Zhang, Jiaqi Ma.
[ArXiv] - Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms.
Junwei Deng, Jiaqi Ma.
[ArXiv]
Recent Publications
- A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning.
Yuzheng Hu*, Fan Wu*, Haotian Ye, David Forsyth, James Zou, Nan Jiang, Jiaqi Ma†, Han Zhao†.
NeurIPS 2025 (Oral).
[ArXiv] - GraSS: Scalable Influence Function with Sparse Gradient Compression.
Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, Jiaqi Ma.
NeurIPS 2025.
[ArXiv] - A Reliable Cryptographic Framework for Empirical Machine Unlearning Evaluation.
Yiwen Tu*, Pingbang Hu*, Jiaqi Ma.
NeurIPS 2025.
[ArXiv] - Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining.
Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, Jiaqi Ma.
NeurIPS 2025.
[ArXiv] - DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models.
Cathy Jiao*, Yijun Pan*, Emily Xiao*, Daisy Sheng, Niket Jain, Hanzhang Zhao, Ishita Dasgupta, Jiaqi Ma†, Chenyan Xiong†.
NeurIPS 2025 (Datasets and Benchmarks Track).
[ArXiv] - DCA-Bench: A Benchmark for Dataset Curation Agents.
Benhao Huang, Yingzhuo Yu, Jin Huang, Xingjian Zhang, Jiaqi Ma.
KDD 2025 (Datasets and Benchmarks Track).
[ArXiv] - A Versatile Influence Function for Data Attribution with Non-Decomposable Loss.
Junwei Deng, Weijing Tang, Jiaqi Ma.
ICML 2025.
[ArXiv] - Adversarial Attacks on Data Attribution.
Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi Ma.
ICLR 2025.
[ArXiv] - dattri: A Library for Efficient Data Attribution.
Junwei Deng*, Ting Wei Li*, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, Jiaqi Ma.
NeurIPS 2024 (Datasets and Benchmarks Track, Spotlight).
[ArXiv][GitHub] - Most Influential Subset Selection: Challenges, Promises, and Beyond.
Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi Ma.
NeurIPS 2024.
[ArXiv]
(* Equal Contribution; † Euqal Advising)
(Note: I publish under the name Jiaqi W. Ma, starting in Sep 2024.)
PhD Students
Teaching
- Instructor, IS 527, SP24/SP25, University of Illinois Urbana-Champaign.
Network Analysis. - Instructor, IS 327, FA23/SP25, University of Illinois Urbana-Champaign.
Concepts of Machine Learning.
Misc
Pronunciation of my first name: Jia-Chi.