Architecture
This page describes the high-level architecture of gdelt-py and how it manages multiple data sources.
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ GDELTClient │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ doc │ │ geo │ │ context │ │ tv │ │ tv_ai │ │
│ │ (API) │ │ (API) │ │ (API) │ │ (API) │ │ (API) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ events │ │mentions │ │ gkg │ │ ngrams │ │ lookups │ │
│ │(Multi) │ │ (Multi) │ │ (Multi) │ │ (Multi) │ │ (Bundled) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ REST APIs │ │ BigQuery │ │ Raw Files │
│ (Real-time) │ │ (Fallback/ │ │ (HTTP) │
│ │ │ Historical) │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
Endpoint Types
The library provides two categories of endpoints:
API-Only Endpoints (top row):
doc- Article fulltext search (DOC 2.0 API)geo- Geographic visualizations (GEO 2.0 API)context- Sentence-level search (Context 2.0 API)tv- Television caption search (TV 2.0 API)tv_ai- Visual TV search (TV AI 2.0 API)
These endpoints only access GDELT REST APIs and have no fallback options.
Multi-Source Endpoints (bottom row):
events- Event records (Files + BigQuery)mentions- Article mentions of events (Files + BigQuery)gkg- Global Knowledge Graph records (Files + BigQuery)ngrams- Web NGrams 3.0 (Files + BigQuery)lookups- Reference data (bundled with library)
These endpoints can switch between raw files and BigQuery for resilience.
Multi-Source Endpoints
For Events, Mentions, GKG, and NGrams, the library implements intelligent source selection:
┌─────────────────────────────────────────┐
│ EventsEndpoint │
├─────────────────────────────────────────┤
│ query(filter, source="auto") │
│ │
│ source="auto" logic: │
│ 1. Try raw files (free, fast) │
│ 2. On failure/429 → BigQuery fallback │
│ 3. Large historical → prefer BigQuery │
└─────────────────────────────────────────┘
Source Selection
# Auto mode (default) - intelligent source selection
events = await client.events.query(filter)
# Explicit file source
events = await client.events.query(filter, source="files")
# Explicit BigQuery source
events = await client.events.query(filter, source="bigquery")
Fallback Behavior
When source="auto" (default):
- Primary: Raw files are tried first (free, no authentication required)
- Fallback: On rate limiting (HTTP 429) or errors, automatically switches to BigQuery
- Large queries: For historical queries spanning many days, BigQuery may be preferred
Configuration
Control fallback behavior via settings:
from py_gdelt import GDELTClient, GDELTSettings
settings = GDELTSettings(
fallback_to_bigquery=True, # Enable automatic fallback
bigquery_project="my-gcp-project", # Required for BigQuery
)
async with GDELTClient(settings=settings) as client:
# Will fallback to BigQuery if files fail
events = await client.events.query(filter)
Data Flow
Query Execution
- User creates a filter object (e.g.,
EventFilter) - User calls endpoint method (e.g.,
client.events.query(filter)) - Endpoint determines optimal source based on:
- User preference (
sourceparameter) - Query time range
- Previous failures/rate limits
- Source adapter fetches and parses data
- Results are returned as Pydantic models
Streaming
For large result sets, use streaming to process records without loading everything into memory: