Training large AI models – especially foundation models and generative architectures – can consume megawatt-hours of electricity, often with associated CO₂ emissions depending on the energy source. For AI developers, researchers and CTOs, quantifying and minimising this energy footprint is not just a matter of ethics – it has implications for cost optimisation, regulatory compliance and long-term sustainability.
We’ve been examining the leading frameworks and tools designed to estimate, track, and compare the energy consumption of AI systems. In this article we briefly explore their methodologies, assess their strengths and weaknesses, and compare them in a side-by-side format to help developer teams choose the most appropriate solution for their needs.
The importance of energy estimation in AI
The energy consumption of AI is not a monolithic concept. It varies significantly depending on:
- The type of model (e.g. transformer, CNN, RNN),
- The stage of the lifecycle (training vs inference),
- The hardware used (TPUs, GPUs, CPUs, memory),
- The deployment environment (on-premise vs cloud),
- The geographic location of computation (which affects carbon intensity).
Accurate energy and emissions tracking helps in making informed design decisions, comparing model architectures, reducing cloud costs, and reporting on environmental impact for ESG (Environmental, Social and Governance) goals.
Comparison of AI energy estimation frameworks
Here’s our side-by-side comparison of the major tools and frameworks we’ve identified for estimating AI’s energy use and the carbon impact of AI models:
Tool | Scope | Use Case | Strength |
CodeCarbon | Training | Real-time emissions tracking | Easy integration, region-aware |
ML CO2 Calculator | Estimation | Early-stage planning | Very fast, accessible |
Microsoft Sustainability | Production (Azure) | Corporate ESG reporting |
Detailed, scaleable |
AI Energy Score | Inference (NLP) | Model selection | Benchmarked leaderboard |
Green Algorithms | Estimation | Research and academic use | Transparent, reproducible |
Experiment Impact Tracker | Training | Comparative experiment analysis | Detailed logs, supports tracking frameworks |
Carbontracker | Training + inference | Cloud usage tracking | Multi-cloud, lightweight |
ML.ENERGY Leaderboard | Inference (LLMs) | LLM evaluation and comparison | Focused, public leaderboard |
Analysis of tools
codecarbon
- Methodology: Tracks energy usage from hardware (CPU/GPU) and maps it to regional CO₂ emissions based on IP geolocation or manual override. Designed for Python environments.
- Key benefits: Real-time tracking during training runs. Simple integration with ML frameworks like PyTorch, TensorFlow, and Hugging Face Outputs include CO₂ in kg, power in kWh, and logs over time
- Limitations: Regional granularity varies depending on availability of grid carbon intensity data. Assumes typical power consumption based on hardware rather than directly measuring it
ml CO₂ impact calculator
- Methodology: Takes manual inputs like hardware type, usage duration, cloud region, and memory to estimate total energy use and emissions using predefined coefficients.
- Key benefits: Instant results via web interface or CLI Ideal for high-level estimation and quick comparisons
- Limitations: Dependent on user knowledge of system details. Unsuitable for continuous monitoring or dynamic workloads
Microsoft sustainability calculator
- Methodology: Works by aggregating Azure resource usage (compute, storage, networking) and matching it with Microsoft’s internal carbon accounting and power usage effectiveness (PUE) data.
- Key benefits: Highly accurate for Azure workloads. Integrates directly into enterprise sustainability reporting tools
- Limitations: Exclusive to Microsoft Azure customers. Doesn’t cover on-premise or multi-cloud setups
AI energy score (hugging face)
- Methodology: Standardised benchmarking suite measuring AI’s energy use per inference for pre-selected NLP models across common tasks and datasets.
- Key benefits: Enables model selection based on performance-to-energy ratio. Publicly accessible leaderboard encourages transparency and accountability
- Limitations: Limited to inference phase and supported tasks. Cannot be easily customised for novel models or domains
Green algorithms
- Methodology: A scientific, formula-based calculator that estimates AI’s energy use and emissions based on hardware type, usage time, core count, and geography.
- Key benefits: Useful for post-hoc academic reporting. Highlights carbon intensity variation by country
- Limitations: Static, analytical model that doesn’t reflect runtime variability. Power profiles are based on generalised assumptions
Experiment impact tracker
- Methodology: Hooks into ML training scripts to track memory, CPU, and GPU usage, and estimates energy draw over time. Can correlate usage with CO₂ emissions based on location.
- Key benefits: Provides rich logs per experiment. Compatible with experiment management tools (e.g. Sacred, MLflow)
- Limitations: Setup is more complex than web-based tools. Power estimation relies on average power draw values.
Carbontracker
- Methodology: Targets cloud environments and uses metadata about cloud provider regions and hardware to estimate emissions.
- Key benefits: Light-touch integration for Python developers Supports AWS, GCP, Azure with automatic detection.
- Limitations: Less suitable for local or hybrid cloud deployments. Depends on cloud provider’s carbon intensity data.
ml.energy leaderboard
- Methodology: Runs models (mainly LLMs) on a fixed testbed and publishes energy usage results for training and inference phases.
- Key benefits: High-impact, model-specific insights Includes metrics like emissions per token generated
- Limitations: Focused exclusively on large language models. Users can’t currently test their own models on the platform.
Choosing the right framework
When selecting an energy estimation framework, the ideal choice will depend on several key factors:
- Integration requirements: Do you need real-time tracking in training scripts (e.g. CodeCarbon, Experiment Impact Tracker), or are you assessing energy post-deployment (e.g. Carbon Tracker)?
- Deployment environment: Are your models cloud-based (e.g. Azure, AWS, GCP) or on-premise? Tools like Microsoft Sustainability Calculator and Carbon Tracker are platform-specific.
- Scope and granularity: Are you evaluating individual experiments, system-wide workloads, or comparing models? Some tools are high-level calculators, while others track energy per epoch or per prediction.
- Model type and domain: Are you working primarily with NLP, vision, or multi-modal models? Tools like ML.ENERGY and AI Energy Score cater to specific domains.
- Ease of use vs depth: Simpler tools (like ML CO₂ Calculator) trade off precision for speed, while detailed frameworks (like Experiment Impact Tracker) offer more insight but require setup.
Quantifying and managing energy consumption is becoming a core competency for AI teams. With growing regulatory and environmental scrutiny – and the rising operational cost of energy-intensive AI – knowing your AI’s energy use and carbon footprint isn’t an optional extra, it’s essential.
By incorporating one or more of these energy estimation frameworks into your development pipeline, you gain visibility into the true cost of model training and inference. Whether you’re aiming to meet sustainability goals, reduce costs, or simply build responsible AI, the tools now exist to support energy-aware development practices.