A novel peer-to-peer federated learning framework that eliminates central coordination while preserving Git-inspired version control for effective model staleness management.
Federated Learning (FL) has emerged as a transformative distributed machine learning paradigm enabling privacy-preserving model training across multiple devices without sharing raw data. However, existing FL frameworks face critical challenges: device heterogeneity introduces "stragglers" that delay training, non-IID data distributions cause model divergence, and central server architectures create single points of failure.
This research addresses these limitations by developing Decentralized GitFL, a fully distributed federated learning system that applies Git-inspired version control principles to eliminate central coordination while effectively managing model staleness. Our implementation demonstrates 38.56% accuracy (5% improvement) with 7.2ร faster convergence compared to centralized approaches.
- Eliminates central server dependencies and single points of failure
- Enables direct peer-to-peer model exchange without coordination overhead
- Maintains system functionality even with partial network failures
- Implements Git-inspired operations (push, pull, merge) for model management
- Version-weighted aggregation to mitigate staleness effects
- Distributed model repositories with limited history tracking
- Multi-factor reward system considering version, curiosity, and recency
- Adaptive communication patterns that optimize knowledge dissemination
- Dynamic topology management for efficient network utilization
- Performance evaluation on CIFAR-10 with both IID and non-IID distributions
- Comparative analysis against centralized GitFL implementation
- Reproducible results with standardized evaluation metrics
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Decentralized GitFL Network โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Node 0 Node 1 Node 2 Node 3 โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โBranch v3โโโโโคBranch v4โโโโโคBranch v2โโโโโคBranch v3โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โฒ โฒ โฒ โฒ โ
โ โ โ โ โ โ
โ โโโโโโผโโโโโ โโโโโโผโโโโโ โโโโโโผโโโโโ โโโโโโผโโโโโ โ
โ โRepositoryโ โRepositoryโ โRepositoryโ โRepositoryโ โ
โ โP2P Networkโ โP2P Networkโ โP2P Networkโ โP2P Networkโ โ
โ โRL Selectorโ โRL Selectorโ โRL Selectorโ โRL Selectorโ โ
โ โController โ โController โ โController โ โController โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Component | Description | Implementation |
|---|---|---|
| Node Controller | Manages concurrent threads for training, discovery, and sharing | DecentralizedGitFLNode.py |
| Repository | Implements Git-inspired version control with distributed tracking | decentralized_repository.py |
| P2P Network | Facilitates reliable TCP-based communication between peers | p2p_network.py |
| RL Selector | Optimizes peer selection using multi-factor reward system | distributed_rl_selector.py |
| Neural Network | CNN architecture for collaborative model training | models/Nets.py |
Our distributed version control system implements three key Git-inspired operations:
# Version-weighted model merging
def compute_master_model(self):
weights = {node_id: version / total_versions
for node_id, version in self.branch_versions.items()}
# Weighted averaging of branch models
for key in master_dict:
master_dict[key] = sum(weights[node_id] * branch_models[node_id][key]
for node_id in self.branch_models)The adaptive peer selection mechanism uses a composite reward function:
R_peer = max(0.00001, R_version + R_curiosity + R_recency)
Where:
- R_version: Balances version disparities across the network
- R_curiosity: Encourages exploration of less-frequently selected peers
- R_recency: Promotes periodic interaction with all network participants
| Metric | Centralized GitFL | Decentralized GitFL | Improvement |
|---|---|---|---|
| Final Accuracy | 33.47% | 38.56% | +5.09% |
| Convergence Time | 6,438s | 900s | 7.2ร faster |
| Network Resilience | Single point of failure | Fault tolerant | Eliminates bottleneck |
| Scalability | Server bottleneck | Distributed load | Enhanced |
Time (s) Centralized Decentralized Performance Gap
โโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
0 10.09% 10.15% -0.06%
900 ~15%* 38.56% +23.56%
6,438 33.47% N/A N/A
* Extrapolated based on convergence rate
The decentralized approach demonstrates superior learning dynamics:
- Initial Phase (0-184s): Rapid local learning with 18.24% accuracy
- Collaboration Phase (184-563s): Effective knowledge sharing reaching 32%
- Convergence Phase (563-900s): Stable improvement to 38.56%
# Core Dependencies
Python >= 3.8
PyTorch >= 1.9
NumPy >= 1.21
torchvision >= 0.10# Clone the repository
git clone https://github.com/your-username/decentralized-gitfl.git
cd decentralized-gitfl
# Install dependencies
pip install torch torchvision numpy
# Verify installation
python -c "import torch; print(f'PyTorch version: {torch.__version__}')"# Run basic simulation with 5 nodes for 15 minutes
python main.py --nodes 5 --runtime 900 --epochs 2
# Advanced configuration with non-IID data
python main.py --nodes 10 --iid 0 --alpha 0.1 --runtime 1800
# Custom network topology
python main.py --nodes 7 --base_port 9000 --epochs 3| Parameter | Description | Default | Range |
|---|---|---|---|
--nodes |
Number of participating nodes | 5 | 3-50 |
--runtime |
Simulation duration (seconds) | 300 | 60-3600 |
--epochs |
Local training epochs per round | 5 | 1-10 |
--iid |
Data distribution (1=IID, 0=non-IID) | 1 | 0,1 |
--alpha |
Dirichlet alpha for non-IID | 0.5 | 0.1-2.0 |
--base_port |
Starting port for P2P communication | 8000 | 1024-65535 |
DECENTRALIZED_GITFL/
โโโ ๐ Core Implementation
โ โโโ DecentralizedGitFLNode.py # Main node integration
โ โโโ decentralized_repository.py # Distributed version control
โ โโโ distributed_rl_selector.py # RL-based peer selection
โ โโโ p2p_network.py # Peer-to-peer communication
โ โโโ simulation.py # Orchestration and evaluation
โ
โโโ ๐ง Neural Networks
โ โโโ models/
โ โ โโโ Nets.py # CNN architectures
โ โ โโโ resnetcifar.py # ResNet implementations
โ โ โโโ test.py # Model evaluation utilities
โ
โโโ ๐ ๏ธ Utilities
โ โโโ utils/
โ โ โโโ get_dataset.py # Dataset management
โ โ โโโ set_seed.py # Reproducibility utilities
โ
โโโ ๐ฏ Execution
โ โโโ main.py # Entry point
โ โโโ requirements.txt # Dependencies
โ
โโโ ๐ Documentation
โโโ README.md # This file
โโโ LICENSE # MIT License
โโโ .gitignore # Git exclusions
Our evaluation follows rigorous research standards:
- Controlled Environment: Fixed hardware specifications (Intel Core Ultra 5 125H, 16GB RAM)
- Reproducible Setup: Standardized random seeds (42) across all experiments
- Comparative Analysis: Direct comparison with centralized GitFL baseline
- Statistical Validation: Multiple runs with confidence intervals
- Primary Dataset: CIFAR-10 (50,000 training, 10,000 test images)
- Model Architecture: Convolutional Neural Network with 6-16 filter progression
- Data Distribution: Both IID and non-IID (Dirichlet ฮฑ=0.5) scenarios
- Evaluation Metrics: Test accuracy, convergence time, communication overhead
- Initial Configuration: Ring topology for guaranteed connectivity
- Dynamic Expansion: Peer discovery enables mesh-like connections
- Fault Tolerance: System maintains functionality with node failures
The decentralized approach exhibits superior convergence properties:
# Typical accuracy progression
Time_points = [0, 184, 364, 563, 739, 900]
Accuracies = [10.15, 18.24, 31.59, 32.03, 37.73, 38.56]
# Convergence rate: ~0.031% per second
Convergence_rate = (38.56 - 10.15) / 900 # 0.0316% per second| Pattern | Centralized | Decentralized | Advantage |
|---|---|---|---|
| Message Flow | Hub-and-spoke | Mesh topology | Distributed load |
| Bottlenecks | Server capacity | Network bandwidth | Eliminated |
| Scalability | O(n) server load | O(1) per node | Linear improvement |
- Memory Efficiency: Limited model history (max 5 versions per node)
- Computational Load: Distributed across all participants
- Network Bandwidth: Optimized through selective peer communication
- Energy-Aware Selection: Incorporate battery state and power consumption
- Compressed Communication: Implement model compression for bandwidth efficiency
- Hierarchical Organization: Multi-level peer structures for enhanced scalability
- Cross-Domain Applications: Natural language processing, reinforcement learning tasks
- Privacy Enhancement: Differential privacy integration with version control
- Theoretical Analysis: Convergence guarantees under various network conditions
- Real-World Deployment: Heterogeneous IoT device networks
- Byzantine Fault Tolerance: Robustness against malicious participants
- Dynamic Topology Optimization: Adaptive network structure based on performance
- Multi-Modal Learning: Support for heterogeneous model architectures
This work extends the original GitFL framework:
@article{bhattacharya2025decentralized,
title={Exploring Git-Inspired Version Control in Federated Learning: Decentralized GitFL Implementation},
author={Bhattacharya, Tirthoraj},
journal={Master's Thesis, IIIT Allahabad},
year={2025},
institution={Indian Institute of Information Technology, Allahabad}
}
@inproceedings{hu2023gitfl,
title={GitFL: Uncertainty-aware real-time asynchronous federated learning using version control},
author={Hu, Ming and Xia, Zeke and Yan, Dengke and others},
booktitle={2023 IEEE Real-Time Systems Symposium (RTSS)},
pages={145--157},
year={2023}
}We welcome contributions from the research community:
- Bug Reports: Submit detailed issue descriptions with reproduction steps
- Feature Requests: Propose enhancements with technical justification
- Code Contributions: Follow PEP 8 standards with comprehensive documentation
- Research Collaborations: Contact for joint research opportunities
# Code style
python -m flake8 --max-line-length=88 *.py
# Type checking
python -m mypy --ignore-missing-imports *.py
# Testing
python -m pytest tests/ -v --cov=.Tirthoraj Bhattacharya
Master of Technology (Information Technology)
Indian Institute of Information Technology, Allahabad
- ๐ง Email: mse2024008@iiita.ac.in
- ๐๏ธ Institution: IIIT Allahabad
- ๐จโ๐ซ Supervisor: Dr. Anshu S. Anand
This project is licensed under the MIT License - see the LICENSE file for details.
- Dr. Anshu S. Anand for research guidance and supervision
- IIIT Allahabad for providing research infrastructure
- Original GitFL Authors (Hu et al.) for foundational framework inspiration
- PyTorch Community for robust deep learning framework
Advancing Privacy-Preserving Collaborative Intelligence through Distributed Systems Innovation
Developed at the Indian Institute of Information Technology, Allahabad