Events & Mentions
Query GDELT Events and Mentions data from files or BigQuery.
Overview
Events are the core of GDELT - structured records of "who did what to whom, when, where, and how" extracted from global news articles.
Basic Event Queries
from datetime import date, timedelta
from py_gdelt import GDELTClient
from py_gdelt.filters import DateRange, EventFilter
async with GDELTClient() as client:
yesterday = date.today() - timedelta(days=1)
event_filter = EventFilter(
date_range=DateRange(start=yesterday, end=yesterday),
actor1_country="USA",
)
events = await client.events.query(event_filter)
print(f"Found {len(events)} events")
Event Model
Events contain: - global_event_id: Unique identifier - date: Event date - actor1, actor2: Participants (country, name, codes) - event_code: CAMEO event type code - goldstein_scale: Conflict/cooperation score (-10 to +10) - avg_tone: Sentiment (-100 to +100) - action_geo: Location information - source_url: Article URL
Filtering Options
By Actors
event_filter = EventFilter(
date_range=DateRange(start=date(2024, 1, 1)),
actor1_country="USA",
actor2_country="CHN",
)
By Event Type
event_filter = EventFilter(
date_range=DateRange(start=date(2024, 1, 1)),
event_code="14", # Protest
)
By Tone
event_filter = EventFilter(
date_range=DateRange(start=date(2024, 1, 1)),
min_tone=-5.0, # Negative events
max_tone=0.0,
)
By Location
Streaming Events
For large datasets, use streaming:
async with GDELTClient() as client:
event_filter = EventFilter(
date_range=DateRange(
start=date(2024, 1, 1),
end=date(2024, 1, 7),
),
)
async for event in client.events.stream(event_filter):
process(event) # Process one at a time
Deduplication
GDELT often contains duplicate events. Use deduplication:
from py_gdelt.utils.dedup import DedupeStrategy
result = await client.events.query(
event_filter,
deduplicate=True,
dedupe_strategy=DedupeStrategy.URL_DATE_LOCATION,
)
Available strategies:
- URL_ONLY - By source URL
- URL_DATE - By URL and date
- URL_DATE_LOCATION - By URL, date, and location
- ACTOR_PAIR - By actor pair
- FULL - By all fields
Mentions
Mentions track article citations of events:
async with GDELTClient() as client:
mentions = await client.mentions.query("123456789", event_filter)
BigQuery Fallback
When file sources fail, automatically fallback to BigQuery:
settings = GDELTSettings(
fallback_to_bigquery=True,
bigquery_project="my-project",
)
async with GDELTClient(settings=settings) as client:
# Automatically uses BigQuery if files unavailable
events = await client.events.query(event_filter)
Best Practices
- Use streaming for >1000 events
- Enable deduplication for cleaner data
- Use specific filters to reduce data volume
- Handle empty results gracefully
- Set appropriate date ranges (files available for 2015+)