Xu Ouyang
Logo University of Virginia

Greetings! I'm Xu Ouyang, a final-year Computer Science Ph.D. student at the University of Virginia, advised by Prof. Thomas Hartvigsen. My research centers on the science of large language models — scaling laws, training dynamics, data-centric pretraining, and emerging architectures such as continuous diffusion LLMs.

Recent first-author work includes ICML 2026 (the “Shannon Scaling Law”, now used internally at ByteDance Seed for large-scale training-dynamics analysis), ACL 2025 Main (low-bit quantization scaling laws, with 1500+ open quantized LLM checkpoints released on HuggingFace), and TMLR 2025 (ADMIRE-BayesOpt — multi-fidelity Bayesian optimization for LLM data mixture reweighting).

Earlier I spent two great years with Prof. Felix Xiaozhu Lin and Prof. Yangfeng Ji at UVA CS, and have done research and internships at ByteDance Applied ML (MLSys), ByteDance Seed-LLM-Model, Tencent AI Lab Seattle, Rice University (with Prof. Yingyan (Celine) Lin), and UT Austin (with Prof. Atlas (Zhangyang) Wang).

My research interests include:
  • Continuous Diffusion Large Language Models (dLLMs)
  • Scaling Laws, Architectures, and Training Dynamics of LLMs
  • Data-Centric Machine Learning (pretraining data processing & mixture reweighting)
  • Efficient and Scalable Machine Learning Systems

I am on the 2026–2027 job market for full-time industry research opportunities. Please feel free to reach out!


News
2026
• 
A scaling law paper from my internship at ByteDance Seed-LLM-Model has been accepted to ICML 2026! LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
May 01
• 
Start my internship at ByteDance Applied ML (MLSys) @San Jose! Working on Continuous Diffusion LLMs.
Mar 02
2025
• 
A data-centric ML paper has been accepted to TMLR 2025! Project homepage: ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization repo is available!
Sep 18
• 
Start my internship at Bytedance Seed-LLM-Model @San Jose! Working on LLM pretraining.
Jun 23
• 
A scaling law paper is accepted to ACL 2025 Main Conference! Project homepage: Low-Bit Quantization Favors Undertrained LLMs repo is available!
Jun 10
2024
• 
Awarded the UVA iPRIME Student Fellowship for the advance of Precision Immunomedicine!
Nov 27
• 
Start my internship on AGI at Tencent AI Lab Seattle!
May 28
2023
• 
A data filtering paper has been accepted to IJCAI23! Efficient NLP Model Finetuning via Multistage Data Filtering
Apr 19
2022
• 
Awarded the UVA Computer Science Scholar Fellowship!
Aug 28
Education
  • University of Virginia
    University of Virginia
    Ph.D. Student in Computer Science
    Aug. 2022 - present
  • Cornell University
    Cornell University
    M.Eng. in Electrical and Computer Engineering
    Aug. 2019 - Dec. 2020
Experience
  • ByteDance AML-MLSys @San Jose
    ByteDance AML-MLSys @San Jose
    Research Scientist Intern, Continuous Diffusion LLMs
    Mar. 2026 - May. 2026
  • ByteDance Seed-LLM-Model @San Jose
    ByteDance Seed-LLM-Model @San Jose
    Student Researcher Intern, LLM Pretraining
    Jun. 2025 - Jan. 2026
  • Tencent AI Lab, Seattle
    Tencent AI Lab, Seattle
    Artificial General Intelligence Research Intern
    May. 2024 - Nov. 2024
  • Rice University
    Rice University
    Research Assistant, Prof. Yingyan (Celine) Lin
    May. 2021 - Apr. 2022
  • University of Texas at Austin
    University of Texas at Austin
    Research Assistant, Prof. Atlas (Zhangyang) Wang
    Feb. 2021 - Mar. 2021
Teaching & Services
  • CS4710 Artificial Intelligence, Teaching Assistant
    2026 Spring
  • CS4501 Natural Language Processing, Teaching Assistant
    2024 Fall
  • CS6316 Machine Learning, Teaching Assistant
    2024 Spring
  • CS6501 Natural Language Processing, Teaching Assistant
    2023 Fall
  • Reviewer: ICLR 2025/2026, COLM 2025/2026, ICML 2026 (Gold Reviewer, Top 25%), NeurIPS 2026
Selected Publications (view all 11 papers )
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

A unified scaling law modeling LLM pretraining as information transmission over a noisy channel; reconciles monotonic pretraining scaling with U-shaped phenomena such as catastrophic overtraining and quantization-induced degradation. Adopted internally at ByteDance Seed for large-scale training-dynamics analysis.

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

A unified scaling law modeling LLM pretraining as information transmission over a noisy channel; reconciles monotonic pretraining scaling with U-shaped phenomena such as catastrophic overtraining and quantization-induced degradation. Adopted internally at ByteDance Seed for large-scale training-dynamics analysis.

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang*, Shengzhuang Chen*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

A multi-fidelity Bayesian-optimization framework for LLM data-mixture re-weighting in both pretraining and instruction fine-tuning; achieves 5×+ speedups in identifying optimal mixtures, validated from 1M to 7B parameters. Released a public dataset of 460 full training/evaluation runs (13,000+ GPU hours).

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang*, Shengzhuang Chen*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

A multi-fidelity Bayesian-optimization framework for LLM data-mixture re-weighting in both pretraining and instruction fine-tuning; achieves 5×+ speedups in identifying optimal mixtures, validated from 1M to 7B parameters. Released a public dataset of 460 full training/evaluation runs (13,000+ GPU hours).

Low-Bit Quantization Favors Undertrained LLMs
Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

Low-bit quantization favors undertrained LLMs but induces significant degradation on fully-trained models. Released 1500+ quantized LLM checkpoints on HuggingFace spanning multiple model sizes, training-token budgets, and bit widths; derived scaling laws relating quantization-induced degradation to model size, training tokens, and bit width.

Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

Low-bit quantization favors undertrained LLMs but induces significant degradation on fully-trained models. Released 1500+ quantized LLM checkpoints on HuggingFace spanning multiple model sizes, training-token budgets, and bit widths; derived scaling laws relating quantization-induced degradation to model size, training tokens, and bit width.

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation
Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

Efficient NLP Model Finetuning via Multistage Data Filtering
Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

All publications