Calculating AI’s energy use: frameworks and tools

by Team EthicAI | Apr 17, 2025 | Sustainability

Training large AI models – especially foundation models and generative architectures – can consume megawatt-hours of electricity, often with associated CO₂ emissions depending on the energy source. For AI developers, researchers and CTOs, quantifying and minimising this energy footprint is not just a matter of ethics – it has implications for cost optimisation, regulatory compliance and long-term sustainability.

We’ve been examining the leading frameworks and tools designed to estimate, track, and compare the energy consumption of AI systems. In this article we briefly explore their methodologies, assess their strengths and weaknesses, and compare them in a side-by-side format to help developer teams choose the most appropriate solution for their needs.

The importance of energy estimation in AI

The energy consumption of AI is not a monolithic concept. It varies significantly depending on:

The type of model (e.g. transformer, CNN, RNN),
The stage of the lifecycle (training vs inference),
The hardware used (TPUs, GPUs, CPUs, memory),
The deployment environment (on-premise vs cloud),
The geographic location of computation (which affects carbon intensity).

Accurate energy and emissions tracking helps in making informed design decisions, comparing model architectures, reducing cloud costs, and reporting on environmental impact for ESG (Environmental, Social and Governance) goals.

Comparison of AI energy estimation frameworks

Here’s our side-by-side comparison of the major tools and frameworks we’ve identified for estimating AI’s energy use and the carbon impact of AI models:

Tool	Scope	Use Case	Strength
CodeCarbon	Training	Real-time emissions tracking	Easy integration, region-aware
ML CO2 Calculator	Estimation	Early-stage planning	Very fast, accessible
Microsoft Sustainability	Production (Azure)	Corporate ESG reporting	Detailed, scaleable
AI Energy Score	Inference (NLP)	Model selection	Benchmarked leaderboard
Green Algorithms	Estimation	Research and academic use	Transparent, reproducible
Experiment Impact Tracker	Training	Comparative experiment analysis	Detailed logs, supports tracking frameworks
Carbontracker	Training + inference	Cloud usage tracking	Multi-cloud, lightweight
ML.ENERGY Leaderboard	Inference (LLMs)	LLM evaluation and comparison	Focused, public leaderboard

Analysis of tools

codecarbon

Methodology: Tracks energy usage from hardware (CPU/GPU) and maps it to regional CO₂ emissions based on IP geolocation or manual override. Designed for Python environments.
Key benefits: Real-time tracking during training runs. Simple integration with ML frameworks like PyTorch, TensorFlow, and Hugging Face Outputs include CO₂ in kg, power in kWh, and logs over time
Limitations: Regional granularity varies depending on availability of grid carbon intensity data. Assumes typical power consumption based on hardware rather than directly measuring it

ml CO₂ impact calculator

Methodology: Takes manual inputs like hardware type, usage duration, cloud region, and memory to estimate total energy use and emissions using predefined coefficients.
Key benefits: Instant results via web interface or CLI Ideal for high-level estimation and quick comparisons
Limitations: Dependent on user knowledge of system details. Unsuitable for continuous monitoring or dynamic workloads

Microsoft sustainability calculator

Methodology: Works by aggregating Azure resource usage (compute, storage, networking) and matching it with Microsoft’s internal carbon accounting and power usage effectiveness (PUE) data.
Key benefits: Highly accurate for Azure workloads. Integrates directly into enterprise sustainability reporting tools
Limitations: Exclusive to Microsoft Azure customers. Doesn’t cover on-premise or multi-cloud setups

AI energy score (hugging face)

Methodology: Standardised benchmarking suite measuring AI’s energy use per inference for pre-selected NLP models across common tasks and datasets.
Key benefits: Enables model selection based on performance-to-energy ratio. Publicly accessible leaderboard encourages transparency and accountability
Limitations: Limited to inference phase and supported tasks. Cannot be easily customised for novel models or domains

Green algorithms

Methodology: A scientific, formula-based calculator that estimates AI’s energy use and emissions based on hardware type, usage time, core count, and geography.
Key benefits: Useful for post-hoc academic reporting. Highlights carbon intensity variation by country
Limitations: Static, analytical model that doesn’t reflect runtime variability. Power profiles are based on generalised assumptions

Experiment impact tracker

Methodology: Hooks into ML training scripts to track memory, CPU, and GPU usage, and estimates energy draw over time. Can correlate usage with CO₂ emissions based on location.
Key benefits: Provides rich logs per experiment. Compatible with experiment management tools (e.g. Sacred, MLflow)
Limitations: Setup is more complex than web-based tools. Power estimation relies on average power draw values.

Carbontracker

Methodology: Targets cloud environments and uses metadata about cloud provider regions and hardware to estimate emissions.
Key benefits: Light-touch integration for Python developers Supports AWS, GCP, Azure with automatic detection.
Limitations: Less suitable for local or hybrid cloud deployments. Depends on cloud provider’s carbon intensity data.

ml.energy leaderboard

Methodology: Runs models (mainly LLMs) on a fixed testbed and publishes energy usage results for training and inference phases.
Key benefits: High-impact, model-specific insights Includes metrics like emissions per token generated
Limitations: Focused exclusively on large language models. Users can’t currently test their own models on the platform.

Choosing the right framework

When selecting an energy estimation framework, the ideal choice will depend on several key factors:

Integration requirements: Do you need real-time tracking in training scripts (e.g. CodeCarbon, Experiment Impact Tracker), or are you assessing energy post-deployment (e.g. Carbon Tracker)?
Deployment environment: Are your models cloud-based (e.g. Azure, AWS, GCP) or on-premise? Tools like Microsoft Sustainability Calculator and Carbon Tracker are platform-specific.
Scope and granularity: Are you evaluating individual experiments, system-wide workloads, or comparing models? Some tools are high-level calculators, while others track energy per epoch or per prediction.
Model type and domain: Are you working primarily with NLP, vision, or multi-modal models? Tools like ML.ENERGY and AI Energy Score cater to specific domains.
Ease of use vs depth: Simpler tools (like ML CO₂ Calculator) trade off precision for speed, while detailed frameworks (like Experiment Impact Tracker) offer more insight but require setup.

Quantifying and managing energy consumption is becoming a core competency for AI teams. With growing regulatory and environmental scrutiny – and the rising operational cost of energy-intensive AI – knowing your AI’s energy use and carbon footprint isn’t an optional extra, it’s essential.

By incorporating one or more of these energy estimation frameworks into your development pipeline, you gain visibility into the true cost of model training and inference. Whether you’re aiming to meet sustainability goals, reduce costs, or simply build responsible AI, the tools now exist to support energy-aware development practices.

← Opt-out data use schemes: ethical implications The business case for ethical AI: clear ROI →