Fifth place winner of the The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning.
Team Members: Haodi Jiang*, Yitian Yang*, Ruwen Fan, Shiwei Gao, Shaoxun Zeng, Junrong Huang, Huajun Bai, Hao Guo and Youyou Lu (Tsinghua University).
The ASPLOS/EuroSys 2025 IOPDDL contest presented a unique challenge: optimizing the ML model placement among multiple computing devices. The problem is modelled into a graph, and the task is to find the best subgraph with minimal cost and under certain constraints.
Our solution, LENS Optimizer, tries to find the optimal solution by continuously adjusting the solution in two scales:
- Large-Scale Restructuring: Modify one connected component.
- Small-Scale Fine-tuning: Modify one or two node(s).
You may find the details through the code and our presentation slide.
This preject is based on the framework from the contest organizers. We appreciate their efforts in organizing this excellent contest.
These files are added/modified in our solution:
solver.cccontains our C++ solution.eval.cppcontains a simple evaluation program that reads the output of a evaluated program from thestdin.CMakeLists.txtis modified to compileeval.cpp.eval.shcontains a simple script for evaluating the public benchmarks.solution.pdfcontains the slide explaining our solution, which is presented during the ASPLOS&EuroSys conference.
$ git clone --recursive https://github.com/thustorage/iopddl.git
$ mkdir iopddl/build && cd iopddl/build && cmake .. && make
$ ./iopddl example.json 10 | ./eval example.jsonCompared to the original framework, no extra dependencies are introduced.
To evaluate the (public) benchmark datasets, please uncompress the dataset in the benchmarks directory.
$ gzip -d benchmarks/*Then run:
$ ./eval.shThe result (score) will be stored in a text file named result-mmdd-HHMM.
You may add other datasets (e.g., hidden benchmarks) as you need.