Home » News » Govt Makes Official Data AI-Ready, Standardises 288 Datasets Across Ministries to Support LLMs

Government, News

Govt Makes Official Data AI-Ready, Standardises 288 Datasets Across Ministries to Support LLMs

Government Creates AI-Friendly Data Ecosystem to Improve Public Services and Reduce Leakages

June 5, 2026

Indian Masterminds Stories

New Delhi: In a significant step towards building a robust artificial intelligence ecosystem, the Government of India has upgraded its official statistics platform to make government data directly accessible to Large Language Models (LLMs), while simultaneously undertaking a major data harmonisation exercise across ministries.

Speaking at an event organised by the National Council of Applied Economic Research (NCAER) in New Delhi on Friday, Ministry of Statistics and Programme Implementation (MoSPI) Secretary Saurabh Garg said the government is standardising 288 priority datasets that hold major economic and social significance. The initiative is aimed at ensuring that AI systems rely on credible and authoritative government information instead of potentially inaccurate or unverified sources.

Government Data Portal Upgraded for AI Systems

As part of India’s transition towards what Garg described as an “intelligence infrastructure,” MoSPI has introduced a Model Context Protocol (MCP) layer wrapper on its official data portal. The technological enhancement enables Large Language Models to directly access, interpret and process official government statistics.

According to Garg, the move is designed to address a growing concern in the AI era: the risk of models generating outputs based on unreliable information when trusted data sources are not easily accessible.

“If the models don’t get easy access to credible data, there’ll be some other data filling up the gap,” Garg said while explaining the rationale behind the initiative.

The official noted that MoSPI is among the first government institutions globally to implement an MCP layer for official public data, a step expected to significantly improve the quality and reliability of AI-generated insights involving government statistics.

Tackling India’s Data Silo Challenge

While technological upgrades are important, Garg emphasised that the bigger challenge lies in ensuring semantic interoperability — the ability of different systems to understand and interpret data consistently.

He pointed out that data fragmentation across government departments often leads to inconsistencies in definitions and classifications, making it difficult for AI systems to connect information from different sources.

To illustrate the problem, Garg cited the example of housing data, stating that as many as five different ministries currently use five different definitions of what constitutes a “pakka” house.

“I think where we need to work more is on the semantic interoperability, so that AI systems can understand the context of the definitions and the classifications. And this is extremely important because if a definition of any concept in two systems is different, then those two systems cannot talk to each other,” he said.

288 Priority Datasets Identified for Harmonisation

To overcome these inconsistencies, the government has identified 288 priority datasets spread across multiple ministries and departments for standardisation.

The harmonisation effort focuses on creating common metadata standards that can be understood uniformly across government systems, enabling seamless data sharing and integration.

Officials involved in the project are leveraging 38 different types of identifiers and 88 internationally recognised classifications to establish consistency and compatibility among datasets.

The initiative aims to ensure that government data adheres to FAIR principles — Findable, Accessible, Interoperable and Reusable — which are considered global best practices for modern data governance.

Building Trustworthy Data for AI Development

The government’s push comes at a time when AI adoption is accelerating across sectors and Large Language Models increasingly depend on vast quantities of data to generate responses and insights.

Experts have often highlighted that the quality of AI outputs depends heavily on the quality and reliability of the data being used for training and retrieval. By making official statistics directly accessible and machine-readable, the government hopes to create a trusted information ecosystem that can support AI innovation while reducing misinformation risks.

The MCP-enabled platform is expected to make it easier for AI applications, researchers, policymakers and developers to access verified government data, improving both transparency and accuracy.

Better Public Service Delivery Through Integrated Data

Beyond AI development, the harmonisation project is also expected to transform public service delivery and welfare administration.

Garg noted that integrated and standardised datasets are already helping state governments identify beneficiaries more efficiently and implement welfare schemes at a much faster pace.

According to him, states are now able to roll out welfare programmes within weeks of policy announcements, compared to earlier timelines that often extended to a year or more.

The improved integration of government databases has also helped reduce leakages and enhance targeting efficiency, ensuring benefits reach intended recipients more effectively.

Towards an Intelligence-Driven Governance Framework

The initiative reflects the government’s broader vision of creating a data-driven governance ecosystem where interoperable datasets, trusted statistics and AI technologies work together to improve policymaking and citizen services.

As India accelerates its digital transformation journey, the standardisation of critical government datasets and the creation of AI-ready public data infrastructure are expected to play a crucial role in enabling next-generation governance, innovation and public welfare delivery.

With 288 key datasets already identified and harmonisation efforts underway, the government is positioning itself to create a more connected, intelligent and efficient data ecosystem capable of supporting both AI advancement and citizen-centric governance.

Indian Masterminds Stories