DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

DFLOP is a data-driven optimization framework designed to improve distributed training efficiency for Multimodal Large Language Models (MLLMs).
Unlike existing data-agnostic frameworks that parallelize computation blindly, DFLOP adapts parallelism and scheduling to the real data characteristics, mitigating computation imbalance and input-dependent performance variance.

Overview

DFLOP consists of three core components:

Profiling Engine
- Profiles both model and data workloads.
- Builds predictive models for memory and throughput across input shapes.
- Analyzes the empirical input-shape distribution from real datasets.
Data-aware 3D Parallelism Optimizer
- Uses profiling results to determine optimal 3D parallelism configurations
  (Tensor / Pipeline / Data Parallelism) for each module independently.
- Minimizes expected makespan under memory and hardware constraints.
Online Microbatch Scheduler
- Dynamically partitions each training batch using Integer Linear Programming (ILP).
- Balances computation load across pipeline stages in real time.
- Reduces GPU idle time caused by pipeline bubbles.

Getting Started

Installation

Install package

conda create -n dflop python=3.10 -y
conda activate dflop
pip install --upgrade pip  # enable PEP 660 support
pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/cu124

Install an additional package

pip install flash-attn==2.7.3 --no-build-isolation

Download dataset

After downloading:

Place Single Image Dataset and Multiple Image Dataset inside the image_folder (e.g., data/image_folder/)
Place Video Dataset inside the video_folder (e.g., data/video_folder/)

Set these dataset paths in the dataset paths in configs/dataset_config.yaml.

How to Use DFLOP

mllm_model_name can be selected from the following options:
- llavaov
- internvl
llm_model_name can be selected from:
- qwen2.5
- llama3

DFLOP uses a separate configuration file, configs/dflop_config.yaml, to define model selection, dataset paths, hardware resources, and training parameters.

You must fill in the commented sections before running DFLOP.

Running the DFLOP Profiling Engine

Navigate to scripts folder.

The run_profiling_engine.sh script launches the Profiling Engine of DFLOP across multiple nodes.

Each node must have a unique rank_number, assigned sequentially (e.g., 0, 1, 2, 3, ...), so that every node can correctly identify its role in the distributed profiling job.

bash run_profiling_engine.sh <num_nodes> <rank_number> <master_addr>

Running the Data-aware 3D Parallelism Optimizer

After completing the profiling stage, run the Data-aware 3D Parallelism Optimizer to automatically search for optimal parallel configurations based on the profiling results.

bash run_data_aware_optimization.sh

Training with the Online Microbatch Scheduler

Once the optimized configuration is generated, you can start the training phase. During this stage, DFLOP’s Online Microbatch Scheduler runs asynchronously to dynamically balance workloads across GPU pipeline stages in real time.

Each node must have a unique rank_number, assigned sequentially (e.g., 0, 1, 2, 3, ...), so that every node can correctly identify its role in the distributed profiling job.

bash run_training.sh <num_nodes> <rank_number> <master_addr>

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
configs		configs
dflop		dflop
figure		figure
llava		llava
scripts		scripts
torchtitan		torchtitan
torchtune		torchtune
.DS_Store		.DS_Store
README.md		README.md
__init__.py		__init__.py
data_aware_optimizer.py		data_aware_optimizer.py
prof_engine.py		prof_engine.py
pyproject.toml		pyproject.toml
run_profile.py		run_profile.py
scheduler.py		scheduler.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Overview

Getting Started

Installation

Download dataset

How to Use DFLOP

Running the DFLOP Profiling Engine

Running the Data-aware 3D Parallelism Optimizer

Training with the Online Microbatch Scheduler

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Overview

Getting Started

Installation

Download dataset

How to Use DFLOP

Running the DFLOP Profiling Engine

Running the Data-aware 3D Parallelism Optimizer

Training with the Online Microbatch Scheduler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages