Task-Specific Comparisons against Baseline (FullDiT)
FullDiT2 consistently achieves significant speedups while maintaining or improving generation quality compared to the FullDiT baseline across various tasks.
ID Insertion
Quantitative Highlights
- Speedup (Ours): 2.287x
- GFLOPS (Baseline vs Ours): 69.292 vs 33.141 (Ours is lower)
- CLIP-I (Baseline vs Ours): 0.568 vs 0.605 (Ours is higher)
- DINO-S (Baseline vs Ours): 0.254 vs 0.313 (Ours is higher)
- FullDiT2 can even outperform the baseline in ID insertion tasks.
Case 1

Case 2

ID Swap
Quantitative Highlights
- Speedup (Ours): 2.287x
- GFLOPS (Baseline vs Ours): 69.292 vs 33.141
- CLIP-I (Baseline vs Ours): 0.619 vs 0.621
Case 1

Case 2

ID Deletion
Quantitative Highlights
- Speedup (Ours): 2.287x
- GFLOPS (Baseline vs Ours): 69.292 vs 33.141
Case 1
Case 2
Video Re-Camera
Quantitative Highlights
- Speedup (Ours): 3.433x
- GFLOPS (Baseline vs Ours): 101.517 vs 33.407 (~32% of baseline)
- RotErr / TransErr: Comparable or improved (e.g. Baseline 6.173 TransErr vs Ours 5.730)
Case 1
Case 2
Pose-to-Video
Quantitative Highlights
- Speedup (Ours): 2.143x
- GFLOPS (Baseline vs Ours): 64.457 vs 33.111
- PCK (Pose Control): Maintained (e.g. Baseline 72.445 vs Ours 71.408)
Case 1
Case 2
Trajectory-to-Video
Quantitative Highlights
- Speedup (Ours): 2.143x
- GFLOPS (Baseline vs Ours): 64.457 vs 33.111
- RotErr / TransErr: Maintained (e.g. Baseline 1.471 / 5.755 vs Ours 1.566 / 5.714)