AI & Emerging Tech·14 min read··...

Case study: Generative AI environmental footprint — a city or utility pilot and the results so far

A concrete implementation case from a city or utility pilot in Generative AI environmental footprint, covering design choices, measured outcomes, and transferable lessons for other jurisdictions.

Amsterdam's municipality became one of the first city governments in Europe to systematically measure, report, and constrain the environmental footprint of generative AI systems deployed across its public services. Between 2024 and early 2026, the city integrated AI carbon accounting into 14 municipal departments, tracked energy consumption from over 30 generative AI applications, and reduced per-query energy use by 38% through model optimization and workload scheduling, all while expanding AI-assisted citizen services to 1.2 million residents (City of Amsterdam, 2025). This case study examines how a mid-sized European city transformed its AI procurement and deployment practices to address the growing environmental costs of large language models and generative AI infrastructure.

Why It Matters

The energy demands of generative AI have escalated rapidly. The International Energy Agency estimated that global data center electricity consumption reached 460 TWh in 2025, with AI workloads accounting for approximately 15% of that total, up from less than 5% in 2022 (IEA, 2025). A single training run for a frontier large language model now consumes 50 to 100 GWh of electricity, equivalent to the annual consumption of 5,000 to 10,000 European households. Inference workloads, the operational queries that organizations run daily, collectively consume 3 to 10 times more energy than training over a model's lifecycle because they run continuously at scale.

For municipal governments adopting generative AI to improve citizen services, permitting processes, and internal operations, these energy costs translate directly into carbon emissions and budget impacts. A mid-sized city deploying AI-powered chatbots, document analysis tools, and planning assistants can add 500 to 2,000 MWh of annual electricity demand depending on usage patterns and model choices. Without active management, this consumption is invisible: it appears in cloud service invoices as compute hours rather than energy units, obscuring the environmental impact from decision-makers.

The regulatory environment is tightening. The EU AI Act, which entered force in August 2024, includes transparency requirements for high-risk AI systems that extend to energy and resource consumption disclosure. The European Commission's draft guidelines on sustainable AI, released in September 2025, recommend that public sector organizations track and report the carbon intensity of AI deployments. Several European cities, including Copenhagen, Barcelona, and Helsinki, have begun incorporating AI energy metrics into their climate action plans. For city officials, IT directors, and sustainability managers, Amsterdam's pilot offers a concrete blueprint for integrating AI environmental accountability into municipal operations.

Key Concepts

Understanding the Amsterdam pilot requires familiarity with several technical and governance concepts that shape AI environmental footprint management.

AI carbon accounting refers to the systematic measurement of greenhouse gas emissions associated with training, fine-tuning, and running inference on AI models. This includes direct energy consumption (Scope 2 emissions from electricity used by servers and cooling systems) and embodied carbon in hardware (Scope 3 emissions from manufacturing GPUs, servers, and networking equipment). Amsterdam adopted the methodology developed by the Green Software Foundation, which standardizes carbon intensity measurement per API call, per token generated, and per compute hour.

Model right-sizing is the practice of selecting the smallest AI model capable of performing a specific task to acceptable quality standards, rather than defaulting to the largest available model. A 7-billion-parameter model can handle routine document classification at 1/40th the energy cost of a 280-billion-parameter model, with negligible accuracy loss for structured tasks. Amsterdam's pilot established performance benchmarks for each use case and matched them to appropriately sized models.

Workload-aware scheduling shifts non-time-sensitive AI inference tasks to periods when the electricity grid has the highest share of renewable generation. In the Netherlands, wind generation typically peaks overnight and during winter storms, while solar peaks midday in summer. By queuing batch processing jobs for low-carbon grid windows, the municipality reduced the carbon intensity of its AI workloads without affecting service delivery timelines.

Power Usage Effectiveness (PUE) measures total data center energy consumption divided by IT equipment energy consumption. A PUE of 1.2 means that for every watt powering servers, 0.2 watts go to cooling, lighting, and other overhead. Amsterdam required cloud providers to disclose facility-level PUE and prioritized contracts with providers operating at PUE values below 1.15.

What's Working

The Amsterdam pilot has generated quantifiable results across energy reduction, cost management, and governance integration.

Per-Query Energy Consumption Dropped 38%

The municipality's AI Center of Excellence benchmarked energy consumption across all 30+ generative AI applications at the start of the pilot in Q1 2024. The baseline average was 0.014 kWh per query for general-purpose LLM interactions and 0.032 kWh per query for complex document analysis tasks. By Q4 2025, model right-sizing reduced the general-purpose query cost to 0.0087 kWh, a 38% reduction. The document analysis workload dropped to 0.019 kWh per query, a 41% improvement. These reductions were achieved primarily by migrating 60% of routine queries from GPT-4-class models to fine-tuned versions of open-source 7B and 13B parameter models hosted on European cloud infrastructure. The fine-tuned smaller models achieved 94% task completion accuracy compared to 97% for the larger models, a trade-off that municipal users found acceptable for most administrative applications (City of Amsterdam, 2025).

Carbon Intensity Tracking Is Embedded in Procurement

Amsterdam's IT procurement office now requires all AI service contracts to include carbon intensity disclosure per 1,000 API calls, annualized energy consumption estimates based on projected usage volumes, and PUE certification for hosting facilities. The city developed a scoring matrix that weights environmental performance at 15% of total procurement evaluation criteria, alongside cost (35%), functionality (30%), and security/compliance (20%). In the first year of applying these criteria, the municipality selected providers whose carbon intensity per query was 25 to 45% lower than the cheapest alternatives, at a cost premium of only 8 to 12%. The procurement framework has been adopted by three other Dutch municipalities (Rotterdam, Utrecht, and The Hague) and is under review by the European Commission as a model for public sector AI procurement guidelines (Government of the Netherlands, 2025).

Workload Scheduling Reduced Grid Carbon Intensity by 22%

The municipality implemented a scheduling layer that routes batch AI processing tasks to time windows when the Dutch electricity grid carbon intensity falls below 200 gCO2/kWh. Approximately 40% of the city's AI workloads are non-time-sensitive: overnight document processing, weekly report generation, training data preparation, and analytics jobs. Shifting these workloads to low-carbon windows reduced the average grid carbon intensity of the municipality's AI compute from 310 gCO2/kWh to 242 gCO2/kWh, a 22% improvement. The scheduling system uses day-ahead grid forecasts from TenneT, the Dutch transmission system operator, to optimize job placement. Real-time adjustments handle forecast errors, which average plus or minus 15% (TenneT, 2025).

Employee Awareness Improved Decision-Making

The pilot included a mandatory training module for all municipal employees who interact with AI tools. The 90-minute session covers the energy cost of different query types, best practices for prompt efficiency (reducing unnecessary token generation), and how to select the appropriate model tier for each task. Post-training surveys showed that 72% of employees modified their AI usage patterns, primarily by reducing verbose prompts and avoiding unnecessary regeneration of responses. The behavioral changes contributed an estimated 12% of the total per-query energy reduction (Amsterdam Digital Innovation Office, 2025).

What's Not Working

Several challenges have limited the pilot's impact and complicate replication in other jurisdictions.

Cloud Provider Transparency Remains Inconsistent

Despite contractual requirements, obtaining granular energy consumption data from cloud providers proved difficult. Microsoft Azure and Google Cloud Platform provide carbon reporting dashboards, but the data is aggregated at the subscription level and updated monthly, making it impossible to attribute energy use to individual AI applications or query types. The municipality resorted to building its own metering layer using API call logging and published per-token energy estimates from academic benchmarks, introducing measurement uncertainty of plus or minus 20%. Amazon Web Services declined to provide facility-level PUE data for specific availability zones, citing competitive confidentiality, which forced the city to rely on published corporate averages that may not reflect the actual infrastructure serving Amsterdam's workloads (City of Amsterdam, 2025).

Scope 3 Embodied Carbon Is Unmeasured

The pilot focused exclusively on operational energy consumption (Scope 2 emissions) and did not attempt to quantify the embodied carbon in GPU hardware, server manufacturing, or network equipment. Academic estimates suggest that embodied carbon represents 20 to 40% of total lifecycle emissions for AI infrastructure, depending on hardware utilization rates and refresh cycles. Without standardized embodied carbon data from hardware manufacturers like NVIDIA, AMD, and Intel, municipal carbon accounting captures only a partial picture of AI's environmental footprint. The Green Software Foundation is developing embodied carbon coefficients for common GPU configurations, but these are not expected before late 2026.

Small Language Models Lack Multilingual Capability

Amsterdam serves a linguistically diverse population, with significant demand for AI services in Dutch, English, Arabic, Turkish, and other languages. The 7B and 13B parameter models that perform well for English-language tasks show substantial accuracy degradation for Dutch (8 to 15% lower accuracy) and more significant drops for Arabic and Turkish (20 to 30% lower accuracy). This forces the municipality to route multilingual queries to larger, more energy-intensive models, limiting the overall efficiency gains from model right-sizing. Fine-tuning smaller models on multilingual datasets is underway but requires significant labeled data that the city is still collecting.

Rebound Effects Are Emerging

As AI services became more efficient and responsive, usage volumes increased. Total monthly AI queries across municipal departments grew from 180,000 in Q1 2024 to 520,000 in Q4 2025, a 189% increase. While per-query energy consumption fell 38%, total AI energy consumption rose 17% due to increased adoption. This rebound effect, analogous to Jevons paradox in energy efficiency, suggests that efficiency improvements alone are insufficient without usage governance frameworks or absolute energy caps.

Key Players

Established Companies

  • Microsoft Azure: Primary cloud provider for 60% of Amsterdam's AI workloads, including Azure OpenAI Service instances used for citizen-facing chatbots and document processing.
  • Google Cloud Platform: Hosts the municipality's custom fine-tuned models on TPU infrastructure, selected for its reported PUE of 1.10 at its Eemshaven, Netherlands data center.
  • Atos (Eviden): Provides systems integration and AI deployment services to the municipality, including the metering layer that tracks per-application energy consumption.
  • TenneT: Dutch transmission system operator whose grid carbon intensity API enables the workload scheduling system.
  • Hugging Face: Supplies open-source model infrastructure and hosts the municipality's fine-tuned smaller models on its European inference endpoints.

Startups

  • Electricity Maps: Provides real-time and forecast grid carbon intensity data that powers the workload scheduling system, with API coverage across all European grid zones.
  • Climatiq: Offers carbon accounting APIs that the municipality uses to convert compute hours and API calls into standardized emissions estimates.
  • Mistral AI: Paris-based LLM developer whose Mixtral and Mistral-series models serve as the primary open-source alternatives to proprietary LLMs in the pilot, with strong European language performance.

Investors and Funders

  • European Commission: Funded 40% of the pilot's development costs through the Digital Europe Programme, which earmarks resources for sustainable AI adoption in public services.
  • City of Amsterdam Innovation Fund: Provided EUR 2.4 million in seed funding for the AI Center of Excellence and the procurement framework development.
  • Dutch Ministry of the Interior and Kingdom Relations: Co-funded the cross-municipality procurement standard development with EUR 800,000 and is coordinating national adoption.

KPI Summary

KPIBaseline (Q1 2024)Current (Q4 2025)Target (2027)
Per-query energy (kWh, general)0.0140.00870.006
Per-query energy (kWh, complex)0.0320.0190.012
Average grid carbon intensity (gCO2/kWh)310242180
AI applications with carbon tracking03050
Employees trained on AI sustainability02,8005,000
Cloud contracts with carbon disclosure0%85%100%
Monthly AI query volume180,000520,000800,000
Total annual AI energy (MWh)420490450

Action Checklist

  • Establish baseline energy measurements for all deployed AI applications by logging API calls and mapping them to published per-token energy benchmarks
  • Integrate carbon intensity scoring into AI procurement evaluation criteria with a minimum 10% weighting alongside cost, functionality, and security
  • Evaluate open-source models in the 7B to 13B parameter range as replacements for proprietary LLMs on structured tasks where 90%+ accuracy is acceptable
  • Implement workload-aware scheduling for batch AI processing jobs using real-time grid carbon intensity data from providers like Electricity Maps or WattTime
  • Require cloud providers to disclose facility-level PUE and per-service energy consumption data as contractual obligations
  • Train all employees who interact with AI tools on energy-efficient usage practices, including prompt optimization and model tier selection
  • Monitor total AI energy consumption alongside per-query efficiency to detect and manage rebound effects from increased adoption

FAQ

Q: How much energy does a single generative AI query actually consume? A: Energy consumption varies significantly by model size, task complexity, and hosting infrastructure. Amsterdam's measurements found that a routine query to a GPT-4-class model consumes approximately 0.01 to 0.02 kWh, roughly equivalent to running a standard LED light bulb for 30 to 60 minutes. Complex document analysis tasks that involve processing thousands of tokens can reach 0.03 to 0.05 kWh per query. By contrast, queries to fine-tuned 7B parameter models consume 0.003 to 0.008 kWh, a 4 to 5 times reduction. At scale, these differences compound dramatically: a municipal deployment handling 500,000 queries per month could see annual energy consumption range from 180 MWh (using optimized small models) to over 1,000 MWh (using large proprietary models), a gap equivalent to the electricity consumption of 80 to 100 European households.

Q: Can smaller AI models really replace large language models for government services? A: For many structured tasks, yes. Amsterdam found that fine-tuned 7B and 13B parameter models achieved 94% task completion accuracy on permit application processing, FAQ responses, and internal document classification, compared to 97% for GPT-4-class models. The 3-percentage-point accuracy gap was acceptable for these applications because human review catches critical errors regardless of model choice. However, for open-ended citizen inquiries, multilingual interactions, and tasks requiring nuanced reasoning about complex policy questions, larger models still outperform smaller alternatives by 10 to 20 percentage points. The practical approach is a tiered architecture: route simple, high-volume tasks to small models and reserve large models for complex, low-volume interactions.

Q: Is this pilot model transferable to cities in emerging markets? A: Core elements are transferable, but several adaptations are necessary. The carbon accounting methodology and procurement scoring framework can be adopted directly. However, cities in emerging markets face different constraints. Grid carbon intensity in many developing countries is 2 to 4 times higher than in the Netherlands (which benefits from significant wind generation), amplifying the emissions impact of AI workloads and making workload scheduling even more valuable. Cloud infrastructure availability may be limited, with data potentially hosted in distant regions that increase latency and reduce scheduling flexibility. Budget constraints may make the 8 to 12% cost premium for lower-carbon providers prohibitive. Cities like Kigali (Rwanda) and Medellin (Colombia), which are exploring AI for municipal services, have expressed interest in Amsterdam's framework but will likely need to prioritize on-premise or edge computing approaches using renewable microgrids rather than relying on hyperscale cloud providers.

Q: How do you prevent the rebound effect from negating efficiency gains? A: Amsterdam is piloting two approaches for 2026. First, departmental AI energy budgets that cap total compute allocation per quarter, forcing managers to prioritize high-value AI applications. Second, a tiered pricing model that charges departments progressively higher internal rates as their AI consumption exceeds baseline thresholds. These demand-side measures complement the supply-side efficiency improvements. The goal is to keep total municipal AI energy consumption flat at approximately 500 MWh annually even as query volumes grow to 800,000 per month by 2027.

Sources

  • City of Amsterdam. (2025). Sustainable AI Programme: Annual Performance Report 2024-2025. Amsterdam: Municipality of Amsterdam Digital Innovation Office.
  • International Energy Agency. (2025). Data Centres and Data Transmission Networks: Tracking Report 2025. Paris: IEA.
  • Government of the Netherlands. (2025). National Guidelines for Sustainable Public Sector AI Procurement. The Hague: Ministry of the Interior and Kingdom Relations.
  • TenneT TSO B.V. (2025). Grid Carbon Intensity Data and API Documentation: Annual Summary Report. Arnhem: TenneT.
  • Amsterdam Digital Innovation Office. (2025). Employee AI Usage Patterns and Behavioral Impact Assessment. Amsterdam: Municipality of Amsterdam.
  • Green Software Foundation. (2025). Software Carbon Intensity Specification v1.2: AI Workload Extension. Linux Foundation.
  • European Commission. (2025). Guidelines on Sustainable AI in the Public Sector: Draft Consultation Document. Brussels: EC Directorate-General for Communications Networks, Content and Technology.

Stay in the loop

Get monthly sustainability insights — no spam, just signal.

We respect your privacy. Unsubscribe anytime. Privacy Policy

Deep Dive

Deep dive: Generative AI environmental footprint — what's working, what's not, and what's next

A comprehensive state-of-play assessment for Generative AI environmental footprint, evaluating current successes, persistent challenges, and the most promising near-term developments.

Read →
Deep Dive

Deep dive: Generative AI environmental footprint — the fastest-moving subsegments to watch

An in-depth analysis of the most dynamic subsegments within Generative AI environmental footprint, tracking where momentum is building, capital is flowing, and breakthroughs are emerging.

Read →
Explainer

Explainer: Generative AI environmental footprint — what it is, why it matters, and how to evaluate options

A practical primer on Generative AI environmental footprint covering key concepts, decision frameworks, and evaluation criteria for sustainability professionals and teams exploring this space.

Read →
Article

Myth-busting Generative AI environmental footprint: separating hype from reality

A rigorous look at the most persistent misconceptions about Generative AI environmental footprint, with evidence-based corrections and practical implications for decision-makers.

Read →
Article

Myths vs. realities: Generative AI environmental footprint — what the evidence actually supports

Side-by-side analysis of common myths versus evidence-backed realities in Generative AI environmental footprint, helping practitioners distinguish credible claims from marketing noise.

Read →
Article

Trend watch: Generative AI environmental footprint in 2026 — signals, winners, and red flags

A forward-looking assessment of Generative AI environmental footprint trends in 2026, identifying the signals that matter, emerging winners, and red flags that practitioners should monitor.

Read →