A curated list of resources dedicated to enhancing efficiency in AI systems. This repository covers a wide range of topics essential for optimizing AI models and processes, aiming to make AI faster, cheaper, smaller, and greener!
If you find this list helpful, give it a β on GitHub, share it, and contribute by submitting a pull request or issue!
- Facts π
- Tools π οΈ
- Articles π°
- Reports π
- Research Articles π
- Blogs π°
- Books π
- Lectures π
- People π§βπ»
- Organizations π
- Contributing π€
- License π
- 3-40Wh: Amount of energy consumed for one small to long ChatGPT query (Source, 2025)
- 1L: Estimated amount of water required for 20-100 ChatGPT queries (Source, 2025)
- 2 nuclear plants: Number of nuclear plants to constantly work ot generate enough energy if 80M people generate 5 pages per day (Source, 2025)
- 1 smartphone charge: Amount of energy required to AI generate a couple of images or run a few thousands inference with an LLM (Source, 2024)
- >10s: Time requried to generate 1 HD image with Flux on H100 or to generate 100 tokens with Llama 3 on T4 (Source and Source, 2024)
- 7-10 smartphone charges: Amount of energy required to AI generate one video with Wan 2.1 (Source)
- 61,848.0x: Difference between the highest and lowest energy use in energy leaderboard for AI models (Source, 2025).
- 1,300MWh: GPT-3, for example, is estimated to use just under 1,300 megawatt hours (MWh) of electricity; about as much power as consumed annually by 130 US homes (Source, 2024)
- 800M users/week: Amount of users using ChatGPT per week in 2025 (Source)
- 1B messages/day: Amount of ChatGPT queries per day in 2025 (Source)
- +160%: Expected increase of data center power consumption by 2030 (Source)
- x3.8: Hardware acceleration (GPU/TPU) reduces energy consumption by a factor of 3.8 compared with the CPU, for the same task, but also reduces response time by up to 39% (Source)
- x18:The carbon footprint of a task can vary by a factor of 18 depending on the model, framework and backend used (Source)
- β€οΈ Pruna β€οΈ: A package to make AI models faster, smaller, faster, greener by combining compression methods (incl. quantization, pruning, caching, compilation, distillation...) on various hardware.
- TensorRT: High-performance deep learning inference library for NVIDIA GPUs.
- ONNX: Open Neural Network Exchange format for interoperability among deep learning frameworks.
- Code Carbon: Library to track energy and carbon efficiency of various hardware.
- LLM Perf: A framework for benchmarking the performance of transformers models with different hardwares, backends and optimizations.
- ML.ENERGY Leaderboard: An initiative to benchmark energy efficiency of AI models.
- AI Energy Score: An initiative to establish comparable energy efficiency ratings for AI models, helping the industry make informed decisions about sustainability in AI development.
- Model Optimization Toolkit: TensorFlow toolkit for optimizing machine learning models for deployment and execution.
- Green Coding: LLM service that you can use to prompt most open source models and see the resource usage.
- EcoLogits: EcoLogits is a python library that tracks the energy consumption and environmental footprint of using generative AI models through APIs.
- Perplexity Kernels: GPU kernels by Perplexity.
- Fast Tokenizer: Fast tokenizer is an efficient and optimized tokenizer engine for llm inference serving.
- WeightWatcher: WeightWatcher (WW) is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data..
- Cockpit: A Practical Debugging Tool for Training Deep Neural Networks.
- Electrictiy Map: A live map showing the origin of the electricity in world regions and their CO2 intensity.
- MLCA: A tool for machine learning life cycle assessment.
- TritonParse: A visualization and analysis tool for Triton IR files, designed to help developers analyze, debug, and understand Triton kernel compilation processes.
- Routing on Random Forests: A framework for training and serving LLM based on random forest-based routers, thus allowing to optimize for costs.
- LLMCache: An LLM serving engine extension to reduce time-to-first-token and increase throughput, especially under long-context scenarios.
- ExLlamaV3: An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs.
- FlashDeBERTa: Flash implementation of DeBERTa disentangled attention mechanism.
- QuACK: An assortiment of Kernels for GPUs.
- Pi-Quant: An assortiment of Kernels for CPUs.
- pplx-kernels: An assortiment of Kernels for GPUs.
- LMCache: an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios, by optimizing the KV caches.
- FastWan: a family of video generation models trained via βsparse distillationβ.
- GEAK Agent: This is an LLM-based multi-agent framework, which can generate functional and efficient gpu kernels automatically.
- "Energy and AI Observatory" (2025) - IEA
- "AIβs Impacts, how to limit them, and why" (2025) - Better Tech
- "How much energy does ChatGPT use?" (2025) - Epoch AI
- Data centers et intelligence artificielle : la course au gigantisme (2025) - Le Monde
- "What's the environmental cost of AI?" (2024) - CO2 AI
- "Shrinking the giants: Paving the way for TinyAI" (2024) - Cell Press
- "DeepSeek might not be such good news for energy after all" (2024) - MIT Technology Review
- "AI already uses as much energy as a small country. Itβs only the beginning." (2024) - Vox
- "Quelle contribution du numΓ©rique Γ la dΓ©carbonation ?" (2024) - France StratΓ©gie
- "Les promesses de lβIA grevΓ©es par un lourd bilan carbone" (2024) - Le Monde
- "How much electricity does AI consume?" (2024) - The Verge
- "How do I track the direct environmental impact of my own inference and training when working with AI?" (2024) - Blog
- "Data center emissions probably 662% higher than big tech claims. Can it keep up the ruse?" (2024) - The Guardian
- "Light bulbs have energy ratings β so why canβt AI chatbots?" (2024) - Nature
- "The Environmental Impacts of AI -- Primer" (2024) - Hugging Face
- "The Climate and Sustainability Implications of Generative AI" (2024) - MIT
- "AI's "eye-watering" use of resources could be a hurdle to achieving climate goals, argue experts" (2023) - dezeen
- "How coders can help save the planet?" (2023) - Blog
- "Reducing the Carbon Footprint of Generative AI" (2023) - Blog
- "The MPG of LLMs: Exploring the Energy Efficiency of Generative AI" (2023) - Blog
- "Ecologie numΓ©rique: LβIA durable, entre vΕu pieux et opportunitΓ© de marchΓ©" (2025) - LibΓ©ration
- "The environmental impact of local text AI" (2025) - Green Spector
- "Misinformation by Omission: The Need for More Environmental Transparency in AI" (2025) - None
- "A General Framework for Frugal AI" (2025) - AFNOR
- "The 2025 AI Index Report" (2025) - Stanford Human-centered Artificial Intelligence
- "Energy and AI" (2025) - International Energy Agency
- "Key challenges for the environmental performance of AI" (2025) - French Ministry
- "Artificial Intelligence and electricity: A system dynamics approach" (2024) - Schneider
- "Notable AI Models" (2025) - Epoch AI
- "Powering Artificial Intelligence" (2024) - Deloitte
- "Google Sustainability Reports" (2024) - Google
- "How much water does AI consume? The public deserves to know" (2023) - OECD
- "Measuring the environmental impacts of artificial intelligence compute and applications" (2022) - OECD
- "Our contribution to a global environmental standard for AI (2025)" - Mistral AI
- "AI: It's All About Inference Now (2025)" - ACM Queue
- "ScalarLM vLLM Optimization with Virtual Channels" (2025) - ScalarLM
- "Review of Inference Optimization" (2025) - Aussie AI
- "The Limits of Large Fused Kernels on Nvidia GPUs: Why Real-Time AI Inference Needs More" (2025) - Smallest AI
- "How Much Power does a SOTA Open Video Model Use?" (2025) - Hugging Face
- "Improving Quantized FP4 Weight Quality via Logit Distillation" (2025) - Mobius Labs
- "Introducing NVFP4 for Efficient and Accurate Low-Precision Inference" (2025) - Nvidia
- "The LLM Engineer Almanac" (2025) - Modal
- "Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub" (2025) - Hugging Face
- "Reduce, Reuse, Recycle: Why Open Source is a Win for Sustainability" (2025) - Hugging Face
- "Mixture of Experts: When Does It Really Deliver Energy Efficiency?" (2025) - Neural Watt
- "Efficient and Portable Mixture-of-Experts Communication" (2025) - Perplexity
- "Optimizing Tokenization for Faster and Efficient LLM Processing" (2025) - Medium
- "Tensor Parallelism with CUDA - Multi-GPU Matrix Multiplication" (2025) - Substack
- "Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling" (2025) - Nvidia Developer
- "AI CUDA Engineer" (2025) - Sakana AI
- "The ML/AI Engineer's starter guide to GPU Programming" (2025) - Neural Bits
- "Understanding Quantization for LLMs" (2024) - Medium
- "Don't Merge Your LoRA Adapter Into a 4-bit LLM" (2023) - Substack
- "Matrix Multiplication Background User's Guide" (2023) - Nvidia Developer
- "GPU Performance Background User's Guide" (2023) - Nvidia Developer
- Programming Massively Parallel Processors: A Hands-on Approach (2022), Wen-mei W. Hwu, David B. Kirk, Izzat El Hajj
- Efficient Deep Learning (2022), Gaurav Menghani, Naresh Singh
- AI Efficiency Courses: Slides, Exercises (2025) - Lecture by Bertrand Charpentier
- Data Compression, Theory and Applications: YouTube, Slides (2024) - Stanford
- MIT Han's Lab (2024) - MIT Lecture by Han's lab
- GPU Mode (2020) - Tutorials by GPU mode community
Organization | Description | Website |
---|---|---|
Data4Good | A platform that connects data scientists with social impact projects to address global challenges using data. | data4good.org |
Gen AI Impact | A platform dedidaceted to understand generative AI environmental footprint. | genai-impact.org |
Make.org | A global platform that empowers citizens to propose and take action on social and environmental issues through collective projects. | make.org |
CodeCarbon | A tool that helps track the carbon emissions of machine learning models and optimizes them for sustainability. | codecarbon.io |
Sustainable AI Coalition | An organization dedicated to advancing sustainability in AI technologies and promoting best practices for green AI. | sustainableaicoalition.org |
FruitPunch AI | A community that solves AI solutions for impact organizations that contribute to the SDG's. | fruitpunch.ai |
Contributions are welcome! Please follow our contribution guidelines to add new resources or suggest improvements that promote AI efficiency. Youc can contact @sharpenb if you have any questions.
This project is licensed under the MIT License. Feel free to share and use the resources as needed.