Johnson

Senior Director of ML Systems & AI Infrastructure

I work on CUDA kernel optimization, LLM inference and training systems, and large-scale distributed performance engineering. My work spans kernel-level optimization, execution planning, memory hierarchy tuning, benchmarking, and system-level scaling for modern generative AI workloads.

What I focus on

CUDA kernel optimization and Tensor Core programming
LLM inference optimization and speculative decoding
Large-scale distributed training and performance analysis
Benchmarking, profiling, and bottleneck isolation across kernel, runtime, and cluster layers
Bridging model frameworks, runtime systems, and hardware execution

Selected impact

Improved speculative decoding performance in SGLang through metadata replay and kernel fusion work, including end-to-end throughput gains and major reductions in CPU and kernel overhead
Contributed to FlashAttention benchmarking and validation work across modern GPU architectures
Contributed examples and kernel patterns to CUTLASS and CuTe DSL for Hopper and Blackwell-class architectures
Led performance validation and scaling analysis for large-scale training and inference workloads across multi-node GPU clusters
Built and refined benchmarking methodology across kernels, collectives, and end-to-end training and inference paths

Open source and systems work

SGLang

Work on speculative decoding, metadata replay optimization, and inference-path performance improvements.

FlashAttention

Benchmarking, validation, and optimization work for modern accelerator platforms.

CUTLASS / CuTe DSL

Examples and kernel patterns for modern GPU architectures, including Hopper and Blackwell-oriented programming models.

Distributed AI systems

Performance engineering across NCCL, training frameworks, cluster scaling, and production-facing benchmark workflows.

Featured repositories

Pin repositories that best represent your work here, for example:

sglang — inference-path optimization and speculative decoding work
flash-attention — benchmarking and optimization work
cutlass — CUDA kernel examples and DSL-based kernel work
dgxc-benchmarking — benchmark methodology and platform validation
a personal notes or benchmark repo — performance studies, repros, and writeups
a CUDA or LLM systems repo — focused examples or performance experiments

What you will find in my repos

Performance-focused engineering work
Benchmark harnesses and reproducible experiments
Profiling-driven optimization
Notes on GPU architecture, Tensor Core programming, and distributed systems
Practical writeups that connect low-level optimization to end-to-end impact

Background

I lead and contribute across the stack, from kernels and runtime behavior to large-scale AI platform performance. My work has included training and inference optimization, cluster-scale benchmarking, and upstream contributions to widely used open-source AI systems.

Contact

GitHub: Johnsonms
LinkedIn: Johnson

Optional shorter version

Johnson Li Senior Director of ML Systems & AI Infrastructure

CUDA kernels, LLM inference and training systems, distributed performance engineering, and benchmarking for modern AI workloads.

Focused on turning low-level optimization into measurable end-to-end gains across kernels, runtimes, and clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Johnson

What I focus on

Selected impact

Open source and systems work

SGLang

FlashAttention

CUTLASS / CuTe DSL

Distributed AI systems

Featured repositories

What you will find in my repos

Background

Contact

Optional shorter version

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Johnson

What I focus on

Selected impact

Open source and systems work

SGLang

FlashAttention

CUTLASS / CuTe DSL

Distributed AI systems

Featured repositories

What you will find in my repos

Background

Contact

Optional shorter version

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages