An open-source, decentralized model collaboration stack.
We'd love to partner with early-stage companies to build together. Join us in the Tiles Discord server. Subscribe to our blog Neurons for updates on on-device AI and personalization research.
Consider supporting our research and open-source projects through Github Sponsors.
This work is currently supported by Boris Mann, Luke Hubbard, Curran Dwyer, Xi Zhang, Dietrich Ayala, and Hugo Duprez.
Below is a living index of resources that inform and inspire our work.
- ✨ Modelfile Reference - Ollama English Documentation
- ✨ Introducing Gemma 3n: The developer guide
- ✨ Foundation Models adapter training - Apple Intelligence - Apple Developer
- ✨ Use MergeKit to Extract LoRA Adapters from any Fine-Tuned Model
- ✨ Apple’s New Containerization Framework: A Deep Dive into macOS’s Future for Developers
- instavm/coderunner: A secure local sandbox to run LLM-generated code using Apple containers
- Introducing the unified multi-modal MLX engine architecture in LM Studio
- Announcing Spiral, Data 3.0, with backing from the best
- mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL
- ✨ Optimizing AI Inference at Character.AI
- ✨ Optimizing AI Inference at Character.AI (Part Deux)
- ✨ PrimeIntellect-ai/prime-iroh: Asynchronous P2P communication backend for decentralized pipeline parallelism
- ✨ Introducing Gemma 3 270M: The compact model for hyper-efficient AI
- ✨ Unternet Kernel
- ✨ SwiftWasm, WebAssembly support for the Swift programming language
- DSPy Notebook, The pretty much "official" DSPy framework for Typescript
- Structured outputs for LLMs
- Accelerated PyTorch training on Mac
- ✨ Unsloth AI - Open Source Fine-tuning & RL for LLMs
- ✨ Introducing LFM2: The Fastest On-Device Foundation Models on the Market
- ✨ Mistral.rs, a cross-platform, highly-multimodal inference engine
- Osmosis, Unlocking AI self-improvement at production scale
- Supermemory MCP
- ✨ Introducing the v0 composite model family, Vercel
- Agent Reinforcement Trainer, OpenPipe
- Universal Quantized File Format: UQFF
- GGUF Tool Suite
- uqff_maker
- Minions, Big & Small LLMs working together
- ✨ The Kaitchup Index: A Leaderboard for Quantized LLMs
- Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily
- Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber
- 📏RULER: Easy Mode for RL Rewards
- ART·E: How We Built an Email Research Agent That Beats o3
- OpenBench, Provider-agnostic, open-source evaluation infrastructure for language models
- ✨ LoRA's Limitations: Head-to-Head with Full RL
- ✨ A case for client-side machine learning, Christopher Fleetwood
- Democratizing Al: The Psyche Network Architecture, Nous Research
- Interoperability: Swift’s Super Power, Speaking in Swift by The Browser Company
- LoRA Learns Less and Forgets Less
- ✨ The Bitter Lesson is coming for Tokenization
- On the Way to LLM Personalization: Learning to Remember User Conversations, Apple Machine Learning Research
- ✨ Text-to-LoRA: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input ,Sakana AI
- Transformer²: Self-Adaptive LLMs
- How memory augmentation can improve large language models, IBM Research
- Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
- ✨ The Power of Efficiency: Edge Al’s Role in Sustainable Generative Al Adoption
- ✨ Small Language Models are the Future of Agentic AI, NVIDIA Research
- ✨ Defeating Prompt Injections by Design, Google Deepmind
- LLM in a flash: Efficient Large Language Model Inference with Limited Memory
- Introducing FlexOlmo: a new paradigm for language model training and data collaboration, Allen AI
- WhisperKit: On-device Real-time ASR with Billion-Scale Transformers, Argmax
- ✨ Towards Large-scale Training on Apple Silicon, Exo Labs
- Kinetics: Rethinking Test-Time Scaling Laws
- Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
- LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
- AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air
- Comparative Analysis of Retrieval Systems in the Real World
- FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
- On the Way to LLM Personalization: Learning to Remember User Conversations
- A Preliminary Report On Edge-Verified Machine Learning, Exo Labs
- ✨ Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
- ✨ Intent-Based Architecture and Their Risks
- Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
- Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design
- Towards Feasible Private Distributed LLM Inference, Dria
- RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI
- ✨ Hand-picked selection of articles on AI fundamentals/concepts that cover the entire process of building neural nets to training them to evaluating results.
- ✨ The State of On-Device LLMs
- ✨ Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack
- How to Scale Your Model
- ✨ r/LocalLLaMA
- ✨ An Analogy for Understanding Transformers
- ✨ Neural networks, 3Blue1Brown
- GGUF Quantization Docs (Unofficial)
- Reverse-engineering GGUF | Post-Training Quantization
- Reference implementation of the Transformer architecture optimized for Apple Neural Engine
- H100 PCIe vs SXM vs NVL: Which H100 GPU Is Fastest and Most Cost-Effective for Fine-Tuning LLMs?
- Apple Developer, Technotes, Learn about specific development topics through these in-depth technical articles.
- The Apple Wiki
- LLMs on a Budget
Resource inspired from GPU Glossary, Modal
Copyright © 2025, Tiles. All rights reserved.