Homepage - Xu Ouyang

Xu Ouyang

University of Virginia

Greetings! I'm Xu Ouyang, a final-year Computer Science Ph.D. student at the University of Virginia, advised by Prof. Thomas Hartvigsen. My research centers on the science of large language models — scaling laws, training dynamics, data-centric pretraining, and emerging architectures such as continuous diffusion LLMs.

Recent first-author work includes ICML 2026 (the “Shannon Scaling Law”, now used internally at ByteDance Seed for large-scale training-dynamics analysis), ACL 2025 Main (low-bit quantization scaling laws, with 1500+ open quantized LLM checkpoints released on HuggingFace), and TMLR 2025 (ADMIRE-BayesOpt — multi-fidelity Bayesian optimization for LLM data mixture reweighting).

Earlier I spent two great years with Prof. Felix Xiaozhu Lin and Prof. Yangfeng Ji at UVA CS, and have done research and internships at Tencent Hunyuan, ByteDance Applied ML (MLSys), ByteDance Seed-LLM-Model, Tencent AI Lab Seattle, Rice University (with Prof. Yingyan (Celine) Lin), and UT Austin (with Prof. Atlas (Zhangyang) Wang).

My research interests include:

Continuous Diffusion Large Language Models (dLLMs)
Multimodal & Omni-Modal Learning
Scaling Laws, Architectures, and Training Dynamics of LLMs
Data-Centric Machine Learning (pretraining data processing & mixture reweighting)
Efficient and Scalable Machine Learning Systems

I am on the 2026–2027 job market for full-time industry research opportunities. Please feel free to reach out!

ftp8nr(at)virginia.edu Google Scholar GitHub

HuggingFace LinkedIn

News

2026

•

Start my internship at Tencent Hunyuan as a Multimodal Researcher Intern! Working on pretraining of Omni-Modal models.

Jun 01

•

A scaling law paper from my internship at ByteDance Seed-LLM-Model has been accepted to ICML 2026! LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

May 01

•

Start my internship at ByteDance Applied ML (MLSys) @San Jose! Working on Continuous Diffusion LLMs.

Mar 02

2025

•

A data-centric ML paper has been accepted to TMLR 2025! Project homepage: ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization repo is available!

Sep 18

•

Start my internship at Bytedance Seed-LLM-Model @San Jose! Working on LLM pretraining.

Jun 23

•

A scaling law paper is accepted to ACL 2025 Main Conference! Project homepage: Low-Bit Quantization Favors Undertrained LLMs repo is available!

Jun 10

•

a data selection paper is accepted to ICLR 2025! Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Jan 22

2024

•

Awarded the UVA iPRIME Student Fellowship for the advance of Precision Immunomedicine!

Nov 27

•

Start my internship on AGI at Tencent AI Lab Seattle!

May 28

2023

•

A data filtering paper has been accepted to IJCAI23! Efficient NLP Model Finetuning via Multistage Data Filtering

Apr 19

Education

University of Virginia

Ph.D. Student in Computer Science

Aug. 2022 - present
Cornell University

M.Eng. in Electrical and Computer Engineering

Aug. 2019 - Dec. 2020

Experience

Tencent Hunyuan, Palo Alto

Multimodal Researcher Intern (Omni-Modal)

Jun. 2026 - Present
ByteDance AML-MLSys, San Jose

Research Scientist Intern, Continuous Diffusion LLMs

Mar. 2026 - May. 2026
ByteDance Seed-LLM-Model, San Jose

Student Researcher Intern, LLM Pretraining

Jun. 2025 - Jan. 2026
Tencent AI Lab, Seattle

Artificial General Intelligence Research Intern

May. 2024 - Nov. 2024
Rice University

Research Assistant, Prof. Yingyan (Celine) Lin

May. 2021 - Apr. 2022
University of Texas at Austin

Research Assistant, Prof. Atlas (Zhangyang) Wang

Feb. 2021 - Mar. 2021

Teaching & Services

CS4710 Artificial Intelligence, Teaching Assistant

2026 Spring
CS4501 Natural Language Processing, Teaching Assistant

2024 Fall
CS6316 Machine Learning, Teaching Assistant

2024 Spring
CS6501 Natural Language Processing, Teaching Assistant

2023 Fall
Reviewer: ICLR 2025/2026, COLM 2025/2026, ICML 2026 (Gold Reviewer, Top 25%), NeurIPS 2026, TMLR 2026

Selected Publications (view all 11 papers )

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

A unified scaling law modeling LLM pretraining as information transmission over a noisy channel; reconciles monotonic pretraining scaling with U-shaped phenomena such as catastrophic overtraining and quantization-induced degradation. Adopted internally at ByteDance Seed for large-scale training-dynamics analysis.

[Paper]

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

[Paper]

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang^*, Shengzhuang Chen^*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

A multi-fidelity Bayesian-optimization framework for LLM data-mixture re-weighting in both pretraining and instruction fine-tuning; achieves 5×+ speedups in identifying optimal mixtures, validated from 1M to 7B parameters. Released a public dataset of 460 full training/evaluation runs (13,000+ GPU hours).

[Homepage]

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang^*, Shengzhuang Chen^*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

[Homepage]

Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

Low-bit quantization favors undertrained LLMs but induces significant degradation on fully-trained models. Released 1500+ quantized LLM checkpoints on HuggingFace spanning multiple model sizes, training-token budgets, and bit widths; derived scaling laws relating quantization-induced degradation to model size, training tokens, and bit width.

[Paper]

Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

[Paper]

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

[Paper]

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

[Paper]

Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

[Paper]

Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

[Paper]

Warning

Action required

News

Education

Experience

Teaching & Services

Selected Publications (view all 11 papers )

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Low-Bit Quantization Favors Undertrained LLMs

Low-Bit Quantization Favors Undertrained LLMs

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Efficient NLP Model Finetuning via Multistage Data Filtering

Efficient NLP Model Finetuning via Multistage Data Filtering

All publications