Data Sources & Access Matrix
GDELT provides multiple data sources accessible through different methods. Understanding which source to use for your specific needs is crucial for efficient data retrieval.
Complete Data Source Matrix
| Data Type | API | BigQuery | Raw Files | Time Constraint | Fallback Available |
|---|---|---|---|---|---|
| Articles (fulltext) | DOC 2.0 | - | - | Rolling 3 months | No equivalent |
| Article geo heatmaps | GEO 2.0 | - | - | Rolling 7 days | No equivalent |
| Sentence-level context | Context 2.0 | - | - | Rolling 72 hours | No equivalent |
| TV captions | TV 2.0 | - | - | July 2009+ (full) | No equivalent |
| TV visual/AI | TV AI 2.0 | - | - | 2010+, limited | No equivalent |
| Events v2 | - | Yes | Yes | Feb 2015+ | Files <-> BigQuery |
| Events v1 | - | Yes | Yes | 1979 - Feb 2015 | Files <-> BigQuery |
| Mentions | - | Yes | Yes | Feb 2015+ (v2 only) | Files <-> BigQuery |
| GKG v2 | - | Yes | Yes | Feb 2015+ | Files <-> BigQuery |
| GKG v1 | - | Yes | Yes | 2013 - Feb 2015 | Files <-> BigQuery |
| Web NGrams 3.0 | - | Yes | Yes | Jan 2020+ | Files <-> BigQuery |
Historical Fulltext Search Strategy
Choosing the right source based on time range:
| Need | <3 months | 3mo - 5yr | >5yr |
|---|---|---|---|
| Fulltext search | DOC API | NGrams 3.0 | Not available |
| Entity/theme search | GKG | GKG | GKG (v1 to 2013) |
| Event tracking | Events | Events | Events (v1 to 1979) |
API Endpoints Reference
DOC 2.0 API
- Endpoint:
https://api.gdeltproject.org/api/v2/doc/doc - Purpose: Full-text article search
- Time Window: Rolling 3 months
- Max Records: 250
- Output Modes: artlist, timelinevol, timelinevolraw, timelinetone, timelinelang, timelinesourcecountry, imagecollage, tonechart
GEO 2.0 API
- Endpoint:
https://api.gdeltproject.org/api/v2/geo/geo - Purpose: Geographic visualizations
- Time Window: Rolling 7 days
- Max Points: 1,000-25,000 (mode dependent)
Context 2.0 API
- Endpoint:
https://api.gdeltproject.org/api/v2/context/context - Purpose: Sentence-level search (all terms in same sentence)
- Time Window: Rolling 72 hours
- Max Records: 200
TV 2.0 API
- Endpoint:
https://api.gdeltproject.org/api/v2/tv/tv - Purpose: Television closed caption search
- Time Window: July 2009+ (full archive)
- Max Records: 3,000
TV AI 2.0 API
- Endpoint:
https://api.gdeltproject.org/api/v2/tvai/tvai - Purpose: Visual television search (AI-powered)
- Time Window: 2010+ (limited channels)
GKG GeoJSON API (v1.0 Legacy)
- Endpoint:
https://api.gdeltproject.org/api/v1/gkg_geojson - Purpose: GeoJSON by GKG theme/person/org
- Note: Uses uppercase parameters
BigQuery Tables
| Table | Size | Project |
|---|---|---|
gdelt-bq.gdeltv2.events |
~63GB | gdelt-bq |
gdelt-bq.gdeltv2.events_partitioned |
~63GB | gdelt-bq |
gdelt-bq.gdeltv2.eventmentions |
~104GB | gdelt-bq |
gdelt-bq.gdeltv2.gkg |
~3.6TB | gdelt-bq |
gdelt-bq.gdeltv2.gkg_partitioned |
~3.6TB | gdelt-bq |
gdelt-bq.gdeltv2.webngrams |
Variable | gdelt-bq |
BigQuery Cost
Always use partitioned tables with date filters to avoid full table scans:
Raw File Access
Master File Lists
- English:
http://data.gdeltproject.org/gdeltv2/masterfilelist.txt - Translated:
http://data.gdeltproject.org/gdeltv2/masterfilelist-translation.txt - Last update:
http://data.gdeltproject.org/gdeltv2/lastupdate.txt
URL Patterns (v2)
http://data.gdeltproject.org/gdeltv2/YYYYMMDDHHMMSS.export.CSV.zip
http://data.gdeltproject.org/gdeltv2/YYYYMMDDHHMMSS.mentions.CSV.zip
http://data.gdeltproject.org/gdeltv2/YYYYMMDDHHMMSS.gkg.csv.zip
NGrams 3.0
Update Frequency
Files are updated every 15 minutes at :00, :15, :30, and :45 past the hour.
How gdelt-py Handles Sources
The library provides automatic source selection:
# Auto mode (default) - tries files first, falls back to BigQuery
events = await client.events.query(filter)
# Explicit source selection
events = await client.events.query(filter, source="bigquery")
events = await client.events.query(filter, source="files")
For API-only endpoints (DOC, GEO, Context, TV), there are no fallback options - the library will return an error if the API is unavailable.