Configuration
Configure py-gdelt using environment variables, TOML files, or programmatic settings.
Environment Variables
Set environment variables to configure default behavior:
# Timeouts and retries
export GDELT_TIMEOUT=60
export GDELT_MAX_RETRIES=5
# Caching
export GDELT_CACHE_DIR=/path/to/cache
export GDELT_CACHE_TTL=3600
# BigQuery
export GDELT_BIGQUERY_PROJECT=my-project
export GDELT_BIGQUERY_CREDENTIALS=/path/to/credentials.json
# Behavior
export GDELT_FALLBACK_TO_BIGQUERY=true
export GDELT_VALIDATE_CODES=true
export GDELT_MAX_CONCURRENT_DOWNLOADS=10
TOML Configuration
Create a gdelt.toml file:
[gdelt]
timeout = 60
max_retries = 5
cache_dir = "/path/to/cache"
cache_ttl = 3600
fallback_to_bigquery = true
validate_codes = true
max_concurrent_downloads = 10
[gdelt.bigquery]
project = "my-project"
credentials = "/path/to/credentials.json"
Load it with:
from pathlib import Path
from py_gdelt import GDELTClient
config_path = Path("gdelt.toml")
async with GDELTClient(config_path=config_path) as client:
...
Programmatic Settings
Use GDELTSettings for full control:
from pathlib import Path
from py_gdelt import GDELTClient, GDELTSettings
settings = GDELTSettings(
# Timeouts and retries
timeout=60,
max_retries=5,
# Caching
cache_dir=Path.home() / ".cache" / "gdelt",
cache_ttl=3600,
# BigQuery
bigquery_project="my-project",
bigquery_credentials=Path("/path/to/credentials.json"),
# Behavior
fallback_to_bigquery=True,
validate_codes=True,
max_concurrent_downloads=10,
)
async with GDELTClient(settings=settings) as client:
...
Configuration Options
Timeouts
- timeout (int): HTTP request timeout in seconds. Default:
30 - connect_timeout (int): Connection timeout in seconds. Default:
10
Retries
- max_retries (int): Maximum retry attempts. Default:
3 - retry_backoff (float): Backoff multiplier for retries. Default:
2.0
Caching
- cache_dir (Path): Directory for cached files. Default:
~/.cache/gdelt - cache_ttl (int): Cache time-to-live in seconds. Default:
3600(1 hour) - use_cache (bool): Enable/disable caching. Default:
True
BigQuery
- bigquery_project (str): Google Cloud project ID. Default:
None - bigquery_credentials (Path): Path to credentials JSON. Default:
None - fallback_to_bigquery (bool): Use BigQuery when file sources fail. Default:
False
Performance
- max_concurrent_downloads (int): Max parallel file downloads. Default:
5 - chunk_size (int): Download chunk size in bytes. Default:
8192
Validation
- validate_codes (bool): Validate CAMEO/country codes. Default:
False - strict_mode (bool): Raise errors on invalid codes. Default:
False
Priority Order
Configuration is loaded in this order (later overrides earlier):
- Default values
- Environment variables
- TOML configuration file
- Programmatic
GDELTSettings
BigQuery Setup
For BigQuery access:
- Install dependencies:
pip install py-gdelt[bigquery] - Create Google Cloud project
- Enable BigQuery API
- Create service account and download credentials
- Set credentials path:
Or in code:
settings = GDELTSettings(
bigquery_project="my-project",
bigquery_credentials=Path("/path/to/credentials.json"),
fallback_to_bigquery=True,
)
Best Practices
- Use environment variables for sensitive data (credentials)
- Use TOML for team-shared configuration
- Use programmatic settings for runtime customization
- Enable caching in development, configure carefully in production
- Set appropriate timeouts for your network conditions
- Enable BigQuery fallback for production reliability