GPU Checkpoint/Restore Made Fast and Lightweight

This is the artifact repository for evaluating FAST'26 GPU Checkpoint/Restore Made Fast and Lightweight.

Evaluating the Artifact

bash run.sh

Results summary will be written to results.txt.

Detailed results are stored in eval/log/ with subdirectories for each experiment type.

Artifact Overview

This artifact is a comprehensive evaluation framework for GPU checkpoint/restore systems. It includes:

Source code:
- GCR/ - GCR implementation
Analysis scripts: in eval/analysis/
- Scripts for generating summary and analyzing results
Configuration and utilities:
- run.sh - Main evaluation script
- run_analysis.sh - Result analysis script

Environment Setup

Hardware:
- 2× NVIDIA A100-40GB GPUs with NVLink and PCIe 4.0 (tested configuration)
Software:
- CUDA toolkit 12.6
- PyTorch 2.7.1
- vLLM 0.9.1
- DeepSpeed 0.17.5

Note on framework versions: Since PhOS only supports Transformers 4.30.0, we use the same Transformers version when evaluating workloads that are supported by PhOS to ensure fair comparison. For workloads not supported by PhOS, we use newer versions of Transformers (4.53.3) to enable broader workload coverage.

Dependencies

ServerlessLLM is installed following ServerlessLLM (commit id: 76d472f)
Kernel PTX is generated using Neutrino
Python library dependencies are listed in environment.yml

Citation

We will release our paper after it is camera-ready.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
GCR		GCR
PhoenixOS-old		PhoenixOS-old
eval		eval
vllm		vllm
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
mkfs.sh		mkfs.sh
results.txt		results.txt
run.sh		run.sh
run_analysis.sh		run_analysis.sh
run_gcr.sh		run_gcr.sh
run_phos.sh		run_phos.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPU Checkpoint/Restore Made Fast and Lightweight

Table of Contents

Evaluating the Artifact

Artifact Overview

Environment Setup

Dependencies

Citation

About

Uh oh!

Releases

Packages

Languages

thustorage/GCR

Folders and files

Latest commit

History

Repository files navigation

GPU Checkpoint/Restore Made Fast and Lightweight

Table of Contents

Evaluating the Artifact

Artifact Overview

Environment Setup

Dependencies

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages