Improving Discoverability, Usability and Governance of Priority Agency Data

Overview

To improve the discoverability, usability, and governance of these priority agency data assets agencies shall apply the documenting techniques illustrated below.

Details

This page documents metadata priorities for federal agency data assets. It describes specific keyword and field requirements tied to federal policy directives, and shows how those requirements map to DCAT-US v3.0 fields. Note on DCAT-US version: The original version of this page was written for DCAT-US v1.1. Field guidance has been updated to reflect DCAT-US v3.0. Where fields have changed between versions, both the original v1.1 field and the v3.0 equivalent are noted. —

COVID-19 policy background

In April 2020, OMB Memorandum M-20-16, Federal Agency Operational Alignment to Slow the Spread of Coronavirus COVID-19, directed agencies to prioritize COVID-19 response data as their highest priority data asset. This memo was issued during the public health emergency and required agencies participating in the Federal Data Strategy 2020 Action Plan to elevate COVID-19 datasets in their data inventories. Current status: M-20-16 was a pandemic-era directive. The federal public health emergency for COVID-19 ended in May 2023. M-20-16 is no longer actively enforced as a current data prioritization requirement. Agencies that documented COVID-19 datasets under M-20-16 do not need to remove those keywords — the keywords remain useful for discoverability — but there is no active policy mandate requiring new COVID-19 prioritization at this time. Agencies with COVID-19 datasets in their inventories are encouraged to keep their existing keyword documentation in place and ensure those datasets remain accessible and well-described for ongoing research use.

COVID-19 field guidance

Field Requirement (under M-20-16) v1.1 guidance v3.0 guidance
keyword Required under M-20-16 Include COVID-19 and coronavirus as keywords. Additional keywords encouraged. No change — keyword works the same way in v3.0. Continue including COVID-19 and coronavirus for discoverability.

Example: ["COVID-19", "coronavirus", "viral-testing", "CARES-Act", "SARS-CoV-2"]

Data Assets to Fuel AI Research and Development

AI policy background

In February 2019, President Trump signed Executive Order 13859, Maintaining American Leadership in Artificial Intelligence. This order directed agencies to improve data inventory documentation to enable discovery and usability of federal data assets for AI research, and to prioritize improvements to access and quality of data based on the AI research community’s feedback. Under EO 13859, agencies were directed to tag datasets suitable for AI research with standardized keywords to make them discoverable by the research community. Current status: EO 13859 remains on the books and was never formally rescinded. However the AI policy landscape has evolved significantly since 2019. President Biden issued EO 14110 (Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence) in October 2023, which substantially expanded federal AI governance requirements. President Trump revoked EO 14110 on January 20, 2025, and issued EO 14179 (Removing Barriers to American Leadership in Artificial Intelligence) on January 23, 2025, which is the current governing AI executive order. EO 14179 directs agencies to sustain and enhance American AI dominance and continues to support making federal data available for AI research. The usg-artificial-intelligence keyword convention established under EO 13859 remains a useful and recognized practice for tagging AI-relevant datasets. Agencies should continue applying it. Consult current OMB guidance for any updated keyword requirements under EO 14179.

AI field guidance

Field Requirement (under EO 13859) v1.1 guidance v3.0 guidance
keyword Required Include usg-artificial-intelligence. For training data also include usg-ai-training-data. Additional descriptive keywords encouraged. No change — keyword works the same way in v3.0. Continue using the established keyword conventions.

Example: ["usg-artificial-intelligence", "AI", "machine-learning", "natural-language-processing", "usg-ai-training-data"]
contactPoint Required Include a contact who can discuss restrictions or controls on the dataset with AI researchers. In v3.0 contactPoint uses a Kind object instead of the v1.1 vCard format. The requirement to include a knowledgeable contact remains.

Minimum example:
{
  "@type": "Kind",
  "fn": "AI Data Contact",
  "hasEmail": "mailto:ai-data@agency.gov"
}

Fuller example including a domain expert:
{
  "@type": "Kind",
  "fn": "Dr. Jane Smith",
  "hasEmail": "mailto:jane.smith@agency.gov",
  "title": "Lead Data Scientist",
  "organization-name": "Office of Data Science"
}
dataQuality (v1.1) → hasQualityMeasurement (v3.0) Required under EO 13859 for AI datasets Set dataQuality to true to indicate the dataset meets the agency's Information Quality Guidelines. dataQuality is not in the v3.0 schema. Use hasQualityMeasurement to express quality information in a structured, machine-readable way.

Example:
{
  "@type": "QualityMeasurement",
  "isMeasurementOf": {
    "@type": "Metric",
    "expectedDataType": "xsd:boolean",
    "inDimension": "https://agency.gov/quality/iq-guidelines",
    "definition": "Meets agency Information Quality Guidelines"
  },
  "value": "true"
}

See QualityMeasurement for full field details.
references (v1.1) → isReferencedBy / page (v3.0) Required under EO 13859 if references or model documentation exist Include URLs to publications, model documentation, or other references using the references field. references is not in the v3.0 schema. Use one of two replacements depending on the nature of the link:

Use isReferencedBy for publications or papers that cite or use the dataset:
"isReferencedBy": ["https://doi.org/10.xxxx/example"]

Use page for documentation, model cards, or technical references about the dataset:
"page": [
  {
    "@type": "Document",
    "title": "Model Documentation",
    "accessURL": "https://github.com/GSA/AI-Assistant-Pilot",
    "description": "Technical documentation for the AI model trained on this dataset."
  }
]

Summary: field changes for priority datasets

The table below summarizes how the v1.1 fields used in priority dataset documentation map to v3.0.

v1.1 Field v3.0 Equivalent Notes
keyword keyword No change. Same field, same format.
contactPoint contactPoint Same field. Format updated — use a Kind object with fn and hasEmail. Can now be an array for multiple contacts.
dataQuality (boolean) hasQualityMeasurement (array of QualityMeasurement objects) Field replaced. Use a structured QualityMeasurement object instead of a boolean value.
references (array of URLs) isReferencedBy (citations) or page (documentation) Field replaced by two more specific fields. Use isReferencedBy for things that cite the dataset, page for documentation about the dataset.