Skip to content

Documentation for MobileTransformers - a lightweight, modular framework based on ONNX Runtime for running and adapting large language models (LLMs) directly on mobile and edge devices. It supports on-device fine-tuning (PEFT), efficient inference, quantization, weight merging, and direct inference from merged models.

Notifications You must be signed in to change notification settings

martinkorelic/mobiletransformers-docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📱 MobileTransformers: An On-Device LLM PEFT Framework for Fine-Tuning and Inference

MobileTransformers (or ORTransformersMobile) is a modular framework designed for fully on-device execution of large and small language models (LLM / SLM) on mobile and edge devices.
Built on top of ONNX Runtime, it leverages hardware-accelerated execution providers such as XNNPACK, NNAPI, and QNN for efficient inference and training on Android and similar platforms.

  • OR: ONNX Runtime
  • Transformers: Core architecture of large language models
  • Mobile: Fully on-device mobile execution

Example of MobileTransformers application

Example of MobileTransformers Android application running on Google Pixel 6 (2021) with support for on-device LLM training and inference with retrieval-augmented generation.


📥 Main links

Code Repository

The main code base with implementation including:

  • MARS (Multi-Adapter Rank Sharing) Parameter-Efficient method
  • MobileTransformers framework for on-device LLM fine-tuning, inference and RAG
  • Other scripts for exporting custom SLM/LLM to a mobile device

MobileTransformers main codebase

Research

For a comprehensive understanding of the research behind MobileTransformers, including detailed explanations of Multi-Adapter Rank Sharing (MARS), on-device training methodologies, and experimental results:

Master's Thesis - Parameter-Efficient Tuning of Large Language Models on Mobile Devices


📥 Documentation

Installation instructions, training and inference examples, and API documentation.


🚀 What is MobileTransformers?

A comprehensive, privacy-first framework that empowers researchers and developers to export, fine-tune, merge, and deploy transformer-based language models directly on your Android device. Eliminate dependency on cloud services while maintaining full control over your AI models in your pocket. Perfect for privacy-preserving NLP applications, offline AI assistants, personalized chatbots, and edge computing scenarios where data sovereignty and real-time responsiveness are crucial. Whether you're building the next generation of pocket AI or developing enterprise edge solutions, MobileTransformers provides the foundation for truly autonomous mobile intelligence.

Key Benefits:

  • 🔒 Complete Privacy: Your data never leaves your device
  • 📱 Pocket-Sized AI: Full LLM/SLM capabilities in your smartphone
  • 🔧 Hardware execution provider support: Hardware-accelerated inference for efficient on-device execution
  • 🌐 Offline-First: Works anywhere, anytime, without internet connectivity
  • 🤖 Universal Model Support: Compatible with most custom LLMs/SLMs from Huggingface

📦 Repository Contents

This comprehensive repository provides everything needed for on-device LLM deployment:

  • 🔄 Export Pipeline: Streamlined conversion system transforming Huggingface LLMs/SLMs into PEFT-enabled training models and ONNX inference graphs optimized for Android deployment
  • 📱 Complete Android Application: Full-featured Android folder containing the entire mobile application stack, ready for pocket deployment
  • 🧪 Custom PEFT support: Customizable PEFT solutions for on-device fine-tuning (e.g. LoRA - Low-rank approximation, MARS - Multi-Adapter Rank Sharing and more)
  • 🐍 Training & Inference Scripts: Python implementations supporting both PyTorch and ONNX Runtime, optimized for mobile hardware constraints
  • 🔬 Evaluation Scripts: Comprehensive benchmarking suite for trained models across diverse NLP tasks, including mobile-specific performance metrics and battery consumption analysis

📱 Android Application: ORTransformer

The Android app is split into two main parts:

  • 📲 Kotlin UI Layer
    A lightweight interface acting as a communication bridge, calling APIs from the backend on the mobile device

  • ⚙️ Backend: MobileTransformers
    The core engine of the entire framework, implemented in Kotlin and C++. Can be easily implemented in re-used in another application, pick and choose which features you need.

🔧 Key features include:

  • Modular Android Project: Clean separation of concerns with isolated modules for training, inference, RAG and weight management
  • Hardware-Accelerated Loops: On-device training / fine-tuning and generation loops leveraging NNAPI, XNNPACK, and Qualcomm QNN for optimal mobile performance
  • Dynamic Configuration: Real-time customization of training parameters and inference settings tailored to your Android device's capabilities
  • ONNX Runtime Integration: Optimized model execution specifically tuned for mobile and edge hardware
  • Weight Management: On-device weight merging with automatic export to Android filesystem, enabling model personalization without cloud dependency
  • Seamless Model Loading: Direct import of merged weights into inference graphs for immediate pocket deployment
  • RAG support: Support for Retrieval-Augmented Generation (RAG) using ObjectBox as a fast on-device vector database

✅ Key Capabilities

Feature Description
✅ Export custom PyTorch Huggingface SLM / LLM models Convert Huggingface models with PEFT methods to training & ONNX inference models for on-device use
✅ On-device fine-tuning/training loop Perform parameter-efficient training (PEFT) directly on mobile devices
✅ On-device generation loop with KV caching Efficient text generation using cached key-value tensors for faster autoregressive inference
Customizable training and generation Flexible configuration to adapt training and generation to specific tasks and hardware
✅ On-device weight exporting Save trained or merged weights directly on-device (mobile filesystem)
✅ On-device weight merging Merge base and PEFT weights on-device, with optional quantization for optimized size and speed
✅ Direct inference from merged weights Load merged weights into the inference graph for seamless on-device model execution
Retrieval-Augmented Generation (RAG) Fully on-device vector database integration with ObjectBox for augmented generation

🔧 On-device example

Example of a model being adapted to a personalized smartphone automation dataset where users express intents and the model recommends appropriate automatic actions to perform on the device. This task-oriented dataset is specifically designed for on-device intelligence scenarios.

🧩 Base Model ⚙️ On-device Fine-tuned model
Base on-device model On-device trained LLM model

This example shows how a base model can be fine-tuned and personalized entirely on-device, meaning no data ever leaves the device. During the process, adapters are trained locally, then merged and integrated into the base model on the mobile phone to produce the final fine-tuned version.


🛠️ Built On

  • ONNX Runtime for training/inference and support for mobile-optimized execution providers:
    • XNNPACK
    • NNAPI
    • Qualcomm QNN
  • Huggingface Transformers ecosystem compatibility for model export
  • ObjectBox for lightweight on-device vector databases in RAG workflows

🎯 Why MobileTransformers?

  • Fully on-device - no cloud dependency, maximizing privacy and minimizing latency
  • Enables parameter-efficient fine-tuning (PEFT) on mobile hardware
  • Modular and customizable for research and production use
  • Ready for Android and adaptable to other edge devices
  • Combines cutting-edge generation techniques with practical on-device deployment

🔧 Extensibility and Future Work

MobileTransformers is designed as a flexible platform, allowing easy extension for advanced on-device ML workflows, such as:

  • Beyond text generation - classification, sentiment analysis, named entity recognition, question answering, summarization, and custom NLP tasks tailored for mobile use cases
  • On-device reinforcement learning
  • Federated learning leveraging exported merged weights
  • Integration with additional hardware acceleration backends
  • Support for more PEFT methods and quantization techniques
  • Expansion to other mobile platforms and edge systems

Citation

MobileTransformers Framework

If you are using this framework for your own work, please cite:

@misc{mobiletransformers2025,
  author       = {Koreli\v{c}, Martin and Pejovi{\'c}, Veljko},
  title        = {MobileTransformers: An On-Device LLM PEFT Framework for Fine-Tuning and Inference},
  year         = {2025},
  howpublished = {\url{https://gitlab.fri.uni-lj.si/lrk/mobiletransformers}}
}

Master's Thesis

If you find the research behind MobileTransformers and MARS useful, please also cite the Master's Thesis:

@phdthesis{Korelič_2025,
  title={Parameter-Efficient Tuning of Large Language Models on Mobile Devices},
  url={https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=175561},
  author={Korelič, Martin},
  year={2025}
}

Acknowledgements

This work was supported by the Slovenian Research Agency grant no. N2-0393 approXimation for adaptable diStributed artificial intelligence and grant no. J2-3047 Context-Aware On-Device Approximate Computing.

About

Documentation for MobileTransformers - a lightweight, modular framework based on ONNX Runtime for running and adapting large language models (LLMs) directly on mobile and edge devices. It supports on-device fine-tuning (PEFT), efficient inference, quantization, weight merging, and direct inference from merged models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published