2026

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

A unified scaling law modeling LLM pretraining as information transmission over a noisy channel; reconciles monotonic pretraining scaling with U-shaped phenomena such as catastrophic overtraining and quantization-induced degradation. Adopted internally at ByteDance Seed for large-scale training-dynamics analysis.

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Xu Ouyang, Deyi Liu, Yuhang Cai, Jing Liu, Yuan Yang, Chen Zheng, Thomas Hartvigsen, Yiyuan Ma

The 43rd International Conference on Machine Learning (ICML) 2026

A unified scaling law modeling LLM pretraining as information transmission over a noisy channel; reconciles monotonic pretraining scaling with U-shaped phenomena such as catastrophic overtraining and quantization-induced degradation. Adopted internally at ByteDance Seed for large-scale training-dynamics analysis.

2025

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang*, Shengzhuang Chen*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

A multi-fidelity Bayesian-optimization framework for LLM data-mixture re-weighting in both pretraining and instruction fine-tuning; achieves 5×+ speedups in identifying optimal mixtures, validated from 1M to 7B parameters. Released a public dataset of 460 full training/evaluation runs (13,000+ GPU hours).

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

Xu Ouyang*, Shengzhuang Chen*, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz

Transactions on Machine Learning Research 2025

A multi-fidelity Bayesian-optimization framework for LLM data-mixture re-weighting in both pretraining and instruction fine-tuning; achieves 5×+ speedups in identifying optimal mixtures, validated from 1M to 7B parameters. Released a public dataset of 460 full training/evaluation runs (13,000+ GPU hours).

Low-Bit Quantization Favors Undertrained LLMs
Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

Low-bit quantization favors undertrained LLMs but induces significant degradation on fully-trained models. Released 1500+ quantized LLM checkpoints on HuggingFace spanning multiple model sizes, training-token budgets, and bit widths; derived scaling laws relating quantization-induced degradation to model size, training tokens, and bit width.

Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL Main Conference) 2025

Low-bit quantization favors undertrained LLMs but induces significant degradation on fully-trained models. Released 1500+ quantized LLM checkpoints on HuggingFace spanning multiple model sizes, training-token budgets, and bit widths; derived scaling laws relating quantization-induced degradation to model size, training tokens, and bit width.

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation
Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

Selectformer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation

Xu Ouyang, Felix Xiaozhu Lin, Yangfeng Ji

The Thirteenth International Conference on Learning Representations (ICLR) 2025

Privacy-preserving, efficient data selection for transformers via multi-party computation, enabling fine-grained data valuation in data markets without exposing raw samples.

2023

Efficient NLP Model Finetuning via Multistage Data Filtering
Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

Efficient NLP Model Finetuning via Multistage Data Filtering

Xu Ouyang, Shahina Mohd Azam Ansari, Felix Xiaozhu Lin, Yangfeng Ji

International Joint Conference On Artificial Intelligence (IJCAI) 2023

2022

Supertickets: Drawing Task-agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
Supertickets: Drawing Task-agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Lin

The European Conference on Computer Vision (ECCV) 2022

Supertickets: Drawing Task-agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Lin

The European Conference on Computer Vision (ECCV) 2022

Contrastive Quant: Quantization Makes Stronger Contrastive Learning
Contrastive Quant: Quantization Makes Stronger Contrastive Learning

Yonggan Fu, Qixuan Yu, Meng Li, Xu Ouyang, Vikas Chandra, Yingyan Lin

ACM/IEEE Design Automation Conference (DAC) 2022

Contrastive Quant: Quantization Makes Stronger Contrastive Learning

Yonggan Fu, Qixuan Yu, Meng Li, Xu Ouyang, Vikas Chandra, Yingyan Lin

ACM/IEEE Design Automation Conference (DAC) 2022

I-FlatCam: A 253 FPS, 91.49 µJ/frame Ultra-compact Intelligent Lensless Camera for Real-time and Efficient Eye Tracking in VR/AR
I-FlatCam: A 253 FPS, 91.49 µJ/frame Ultra-compact Intelligent Lensless Camera for Real-time and Efficient Eye Tracking in VR/AR

Yang Zhao, Ziyun Li, Yonggan Fu, Yongan Zhang, Chaojian Li, Cheng Wan, Haoran You, Shang Wu, Xu Ouyang, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Lin

IEEE Symposium on VLSI Technology and Circuits (VLSI) 2022

I-FlatCam: A 253 FPS, 91.49 µJ/frame Ultra-compact Intelligent Lensless Camera for Real-time and Efficient Eye Tracking in VR/AR

Yang Zhao, Ziyun Li, Yonggan Fu, Yongan Zhang, Chaojian Li, Cheng Wan, Haoran You, Shang Wu, Xu Ouyang, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Lin

IEEE Symposium on VLSI Technology and Circuits (VLSI) 2022

e-G2C: A 0.14-to-8.31 µJ/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM
e-G2C: A 0.14-to-8.31 µJ/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

IEEE Symposium on VLSI Technology and Circuits (VLSI) 2022

e-G2C: A 0.14-to-8.31 µJ/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

IEEE Symposium on VLSI Technology and Circuits (VLSI) 2022

2021

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness are Found within Randomly Initialized Networks
Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness are Found within Randomly Initialized Networks

Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

The Conference on Neural Information Processing Systems (NeurIPS) 2021

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness are Found within Randomly Initialized Networks

Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

The Conference on Neural Information Processing Systems (NeurIPS) 2021

'BNN-BN=?': Training Binary Neural Networks Without Batch Normalization
'BNN-BN=?': Training Binary Neural Networks Without Batch Normalization

Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021

'BNN-BN=?': Training Binary Neural Networks Without Batch Normalization

Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021