Harvester API
The Harvester API provides programmatic access to Data.gov’s harvest infrastructure — the system that collects dataset metadata from federal agencies and other publishers. Use this API to look up harvest sources, check job status, and investigate harvest errors.
Getting Started
Base URL: https://api.gsa.gov/technology/datagov_harvest/v2/
Documentation: Full endpoint documentation available via the OpenAPI specification.
Authentication
To begin using this API, you will need to register for an API Key. You can sign up for an API key at open.gsa.gov. After registration, you will need to provide this API key in the x-api-key HTTP header with every API request.
Quick Start
Get a list of all registered harvest sources:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_sources/
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /harvest_sources/ | List all harvest sources |
| GET | /harvest_source/{source_id} | Get a specific harvest source |
| GET | /harvest_source/{source_id}/jobs | List harvest jobs for a source |
| GET | /organization/ | List all organizations |
| GET | /organization/{org_id} | Get a specific organization |
| GET | /harvest_job/{job_id} | Get a specific harvest job |
| GET | /harvest_job/{job_id}/errors | Get errors for a harvest job |
| GET | /harvest_record/{record_id} | Get a specific harvest record |
| GET | /harvest_record/{record_id}/raw | Get raw source payload for a record |
| GET | /harvest_record/{record_id}/transformed | Get transformed payload for a record |
| POST | /validate | Validate a DCAT-US catalog |
Harvest Sources
List Harvest Sources
Returns all registered harvest sources.
Endpoint: GET /harvest_sources/
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_sources/
Example Response:
[
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "Example Agency Data.json",
"url": "https://example.gov/data.json",
"organization_id": "7a85f64-5717-4562-b3fc-2c963f66afa6",
"frequency": "daily",
"schema_type": "dcatus1.1: federal",
"source_type": "document",
"notification_emails": ["data@example.gov"],
"notification_frequency": "on_error"
}
]
Response Fields:
| Field | Type | Description |
|---|---|---|
| id | string (UUID) | Unique identifier for the harvest source |
| name | string | Display name for the source |
| url | string | URL of the data.json or other harvestable resource |
| organization_id | string (UUID) | ID of the publishing organization |
| frequency | string | How often the source is harvested: daily, weekly, biweekly, or monthly. Note: manual exists in the system but by policy all sources must be harvested at least monthly. |
| schema_type | string | Metadata schema: dcatus1.1: federal, dcatus1.1: non-federal, iso19115_1, or iso19115_2 |
| source_type | string | Type of source: document, waf, or waf-collection |
| notification_emails | array | Email addresses notified of harvest results |
| notification_frequency | string | When to send notifications: on_error, always, or on_error_or_update |
Get Harvest Source
Retrieve details for a specific harvest source.
Endpoint: GET /harvest_source/{source_id}
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| source_id | string (UUID) | Yes | The harvest source ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_source/3fa85f64-5717-4562-b3fc-2c963f66afa6
Response: Same structure as individual items in List Harvest Sources.
List Jobs for Source
Retrieve all harvest jobs for a specific source.
Endpoint: GET /harvest_source/{source_id}/jobs
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| source_id | string (UUID) | Yes | The harvest source ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_source/3fa85f64-5717-4562-b3fc-2c963f66afa6/jobs
Example Response:
[
{
"id": "de2010f9-d9ec-4211-9690-5b3bbc9fe1f3",
"harvest_source_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "complete",
"date_created": "2026-04-29T06:00:00.000Z",
"date_finished": "2026-04-29T06:05:23.000Z",
"records_added": 42,
"records_updated": 156,
"records_deleted": 3,
"records_errored": 2
}
]
Organizations
List Organizations
Returns all organizations with registered harvest sources.
Endpoint: GET /organizations/
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/organizations/
Example Response:
[
{
"id": "f4ca4614-8901-409b-8553-2e994ad10023",
"name": "National Aeronautics and Space Administration",
"slug": "nasa",
"organization_type": "Federal Government",
"aliases": ["NASA"],
"logo": "https://example.gov/nasa-logo.png",
"description": "NASA's open data portal.",
"source_count": 5
}
]
Response Fields:
| Field | Type | Description |
|---|---|---|
| id | string (UUID) | Unique organization identifier |
| name | string | Organization display name |
| slug | string | URL-friendly identifier |
| organization_type | string | Type: Federal Government, State Government, City Government, County Government, University, Tribal, or Non-Profit |
| aliases | array | Alternative names or abbreviations |
| logo | string | URL to organization logo |
| description | string | Organization description |
| source_count | integer | Number of harvest sources for this organization |
Get Organization
Retrieve details for a specific organization.
Endpoint: GET /organization/{org_id}
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| org_id | string (UUID) | Yes | The organization ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/organization/f4ca4614-8901-409b-8553-2e994ad10023
Harvest Jobs
Get Harvest Job
Retrieve details for a specific harvest job.
Endpoint: GET /harvest_job/{job_id}
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| job_id | string (UUID) | Yes | The harvest job ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_job/de2010f9-d9ec-4211-9690-5b3bbc9fe1f3
Response Fields:
| Field | Type | Description |
|---|---|---|
| id | string (UUID) | Unique job identifier |
| harvest_source_id | string (UUID) | The source this job harvested |
| status | string | Job status: new, in_progress, complete, or error |
| date_created | string (ISO 8601) | When the job was created |
| date_finished | string (ISO 8601) | When the job completed (null if still running) |
| records_added | integer | Number of new records created |
| records_updated | integer | Number of existing records updated |
| records_deleted | integer | Number of records removed |
| records_errored | integer | Number of records that failed processing |
Get Job Errors
Retrieve errors from a specific harvest job.
Endpoint: GET /harvest_job/{job_id}/errors
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| job_id | string (UUID) | Yes | The harvest job ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_job/de2010f9-d9ec-4211-9690-5b3bbc9fe1f3/errors
Example Response:
[
{
"id": "8b2e4f71-3c9a-4b5d-9e8f-1a2b3c4d5e6f",
"harvest_job_id": "de2010f9-d9ec-4211-9690-5b3bbc9fe1f3",
"harvest_record_id": "d0e03fb2-f885-4b1d-8feb-2d8acc93f4f8",
"date_created": "2026-04-29T06:03:12.000Z",
"type": "validation_error",
"message": "Missing required field: title"
}
]
Harvest Records
Get Harvest Record
Retrieve metadata about a specific harvest record. Harvest records track how individual datasets were ingested into the catalog.
Endpoint: GET /harvest_record/{record_id}
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| record_id | string (UUID) | Yes | The harvest record ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_record/d0e03fb2-f885-4b1d-8feb-2d8acc93f4f8
Example Response:
{
"id": "d0e03fb2-f885-4b1d-8feb-2d8acc93f4f8",
"identifier": "http://datainventory.doi.gov/id/dataset/bsee-0000000070",
"status": "success",
"action": "update",
"date_created": "2026-04-29T06:02:45.000Z",
"date_finished": "2026-04-29T06:02:47.000Z",
"harvest_job_id": "de2010f9-d9ec-4211-9690-5b3bbc9fe1f3",
"harvest_source_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"source_hash": "47ca2dd5471e659e4cd1c83d79adb0b0c2c8c013a1e03d629d56b0541e307267",
"ckan_id": "abc123",
"ckan_name": "example-dataset",
"parent_identifier": null
}
Response Fields:
| Field | Type | Description |
|---|---|---|
| id | string (UUID) | Unique record identifier |
| identifier | string | The dataset’s identifier from the source metadata |
| status | string | Processing status: success or error |
| action | string | What happened: create, update, or delete |
| date_created | string (ISO 8601) | When processing started |
| date_finished | string (ISO 8601) | When processing completed |
| harvest_job_id | string (UUID) | The job that processed this record |
| harvest_source_id | string (UUID) | The source this record came from |
| source_hash | string | Hash of the source metadata (used for change detection) |
| ckan_id | string | ID in the catalog system |
| ckan_name | string | URL-friendly name in the catalog |
| parent_identifier | string | Parent dataset identifier (for collections) |
Get Harvest Record Raw
Retrieve the original, unmodified source payload from a harvest record exactly as it was received.
Endpoint: GET /harvest_record/{record_id}/raw
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| record_id | string (UUID) | Yes | The harvest record ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_record/d0e03fb2-f885-4b1d-8feb-2d8acc93f4f8/raw
Response: The original metadata payload. Content-Type is detected automatically:
application/jsonfor JSON payloadsapplication/xmlfor XML payloadstext/plainfor all other content
Returns 404 Not Found if the record does not exist or has no raw source data.
Get Harvest Record Transformed
Retrieve the transformed DCAT-US payload for a harvest record. This is the version of the metadata after any source-specific transformations have been applied.
Endpoint: GET /harvest_record/{record_id}/transformed
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| record_id | string (UUID) | Yes | The harvest record ID |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' https://api.gsa.gov/technology/datagov_harvest/v2/harvest_record/d0e03fb2-f885-4b1d-8feb-2d8acc93f4f8/transformed
Response: The transformed DCAT-US metadata payload as JSON.
Returns 404 Not Found if the record does not exist or has no transformed data.
Validation
Validate Catalog
Validate a DCAT-US catalog against the 1.1 schema.
Endpoint: POST /validate
Note: A DCAT-US 3.0 validator is planned but not yet available. Check back for updates.
Request Body:
{
"fetch_method": "url",
"url": "https://example.gov/data.json",
"schema": "dcatus1.1: federal dataset"
}
| Field | Type | Description |
|---|---|---|
| fetch_method | string | How to retrieve the catalog: url (fetch from URL) or json_text (inline JSON) |
| url | string | URL to fetch (if fetch_method is url) |
| json_text | string | Inline JSON catalog (if fetch_method is json_text) |
| schema | string | Schema to validate against: dcatus1.1: federal dataset |
Example Request:
curl -H 'X-Api-Key: DEMO_KEY' -X POST https://api.gsa.gov/technology/datagov_harvest/v2/validate \
-H "Content-Type: application/json" \
-d '{
"fetch_method": "url",
"url": "https://example.gov/data.json",
"schema": "dcatus1.1: federal dataset"
}'
For full details on validation responses, see the OpenAPI documentation.
Error Responses
All endpoints return standard HTTP status codes:
| Status Code | Meaning |
|---|---|
| 200 | OK — Request was successful |
| 404 | Not Found — The requested resource does not exist, or the ID provided is not valid |
| 422 | Unprocessable Entity — The request was understood but contains invalid parameter values |
| 500 | Internal Server Error — An unexpected error occurred on the server |
All error responses use this JSON format:
{
"error": "A description of what went wrong"
}
For validation errors (422), additional detail is provided:
{
"message": "Validation error",
"detail": {
"<location>": {
"<field_name>": ["error message"]
}
}
}
Questions or Issues?
If you encounter problems with the Harvester API or have questions about your agency’s harvest sources, contact the Data.gov team at datagovhelp@gsa.gov.