Skip to content

gdelt-py

A comprehensive Python client library for the GDELT (Global Database of Events, Language, and Tone) project.

Features

  • Unified Interface: Single client covering all 6 REST APIs, 3 database tables, and NGrams dataset
  • Version Normalization: Transparent handling of GDELT v1/v2 differences with normalized output
  • Resilience: Automatic fallback to BigQuery when APIs fail or rate limit
  • Modern Python: 3.11+, Async-first, Pydantic models, type hints throughout
  • Streaming: Generator-based iteration for large datasets with memory efficiency
  • Developer Experience: Clear errors, progress indicators, comprehensive lookups

Installation

# Basic installation
pip install gdelt-py

# With BigQuery support
pip install gdelt-py[bigquery]

# With all optional dependencies
pip install gdelt-py[bigquery,pandas]

Quick Start

from py_gdelt import GDELTClient
from py_gdelt.filters import DateRange, EventFilter
from datetime import date, timedelta

async with GDELTClient() as client:
    # Query recent events
    yesterday = date.today() - timedelta(days=1)
    event_filter = EventFilter(
        date_range=DateRange(start=yesterday, end=yesterday),
        actor1_country="USA",
    )

    result = await client.events.query(event_filter)
    print(f"Found {len(result)} events")

Data Sources Covered

File-Based Endpoints

  • Events - Structured event data (who did what to whom, when, where)
  • Mentions - Article mentions of events over time
  • GKG - Global Knowledge Graph (themes, entities, tone, quotations)
  • NGrams - Word and phrase occurrences in articles (Jan 2020+)
  • VGKG 🏗️ - Visual GKG (image annotations via Cloud Vision API)
  • TV-GKG 🏗️ - Television GKG (closed caption analysis)
  • TV/Radio NGrams 🏗️ - Broadcast transcript word frequencies

REST APIs

  • DOC 2.0 - Full-text article search and discovery
  • GEO 2.0 - Geographic analysis and mapping
  • Context 2.0 - Sentence-level contextual search
  • TV 2.0 - Television news closed caption search
  • TV AI 2.0 - AI-enhanced visual TV search (labels, OCR, faces)
  • LowerThird 🏗️ - TV chyron/lower-third text search
  • TVV 🏗️ - TV Visual channel inventory
  • GKG GeoJSON v1 🏗️ - Legacy geographic GKG API

Graph Datasets 🏗️

  • GQG - Global Quotation Graph (extracted quotes with context)
  • GEG - Global Entity Graph (NER via Cloud NLP API)
  • GFG - Global Frontpage Graph (homepage link tracking)
  • GGG - Global Geographic Graph (location co-mentions)
  • GDG - Global Difference Graph (article change detection)
  • GEMG - Global Embedded Metadata Graph (meta tags, JSON-LD)
  • GRG - Global Relationship Graph (subject-verb-object triples)
  • GAL - Article List (lightweight article metadata)

Lookup Tables

  • CAMEO - Event classification codes and Goldstein scale
  • Themes - GKG theme taxonomy
  • Countries - Country code conversions (FIPS ↔ ISO)
  • Ethnic/Religious Groups - Group classification codes
  • GCAM 🏗️ - 2,300+ emotional/thematic dimensions
  • Image Tags 🏗️ - Cloud Vision labels for DOC API
  • Languages 🏗️ - Supported language codes

Data Source Matrix

Data Type API BigQuery Raw Files Time Range Fallback
Articles (fulltext) DOC 2.0 - - Rolling 3 months -
Article geography GEO 2.0 - - Rolling 7 days -
Sentence context Context 2.0 - - Rolling 72 hours -
TV captions TV 2.0 - - Jul 2009+ -
TV visual/AI TV AI 2.0 - - Jul 2010+ -
TV chyrons 🏗️ LowerThird - - Aug 2017+ -
Events v2 - Feb 2015+
Events v1 - 1979 - Feb 2015
Mentions - Feb 2015+
GKG v2 - Feb 2015+
GKG v1 - Apr 2013 - Feb 2015
Web NGrams - Jan 2020+
VGKG 🏗️ - Dec 2015+
TV-GKG 🏗️ - Jul 2009+
GQG 🏗️ - Jan 2020+
GEG 🏗️ - Jul 2016+
GFG 🏗️ - Mar 2018+

🏗️ = Work in progress - coming in future releases

Key Concepts

Async-First Design

All I/O operations are async by default for optimal performance:

async with GDELTClient() as client:
    articles = await client.doc.query(doc_filter)

Synchronous wrappers are available for compatibility:

with GDELTClient() as client:
    articles = client.doc.query_sync(doc_filter)

Streaming for Efficiency

Process large datasets without loading everything into memory:

async with GDELTClient() as client:
    async for event in client.events.stream(event_filter):
        process(event)  # Memory-efficient

Type Safety

Pydantic models throughout with full type hints:

event: Event = result[0]
assert event.goldstein_scale  # Type-checked

Configuration

Flexible configuration via environment variables, TOML files, or programmatic settings:

settings = GDELTSettings(
    timeout=60,
    max_retries=5,
    cache_dir=Path("/custom/cache"),
)

async with GDELTClient(settings=settings) as client:
    ...

Documentation

Full documentation available at: https://rbozydar.github.io/py-gdelt/

Contributing

Contributions are welcome! See Contributing Guide for details.

License

MIT License - see LICENSE file for details.