catalog.data.gov

a.k.a Catalog is a CKAN app containing an index (or catalog) for many federal, state, and municipal datasets. This is the main app of Data.gov and is generally what folks are thinking about when they refer to Data.gov.

Environments

Instance	Url
Production	catalog.data.gov
Staging	catalog-datagov.dev-ocsit.bsp.gsa.gov
ci	catalog.ci.datagov.us

Dependencies

Sub-components:

ckan

Services:

apache2
rds
redis
solr

Logs

Use dsh to view the logs.

Web instances:

/var/log/ckan/ckan.access.log
/var/log/ckan/ckan.error.log

Worker (harvest) instances:

/var/log/fetch-consumer.log
/var/log/gather-consumer.log
/var/log/harvester_run.log

Common tasks

Harvest source stats

ckan-php-manager performs several tasks, including generating a report on harvest sources. See README for full instructions.

$ php cli/harvest_stats_csv.php

Columns include:

title
name
url
created
source_type
org title
org name
last_job_started
last_job_finished
total_datasets

Updating organization logos on Catalog

Harvester commands

All harvester commands should be run from one of the harvesters, usually catalog-harvester1p.

Harvest run

The harvest run command runs every few minutes to manage pending and in-progress harvest jobs. It will (not necessarily in this order):

Queue jobs that have been scheduled
Starts jobs that have been queued
Clean up jobs that have completed or errored
Email job results to points of contact

Run the job through supervisor.

$ sudo supervisorctl start harvest-run

The job is logged to /var/log/ckan/harvest-run.log.

Alert conditions

Common alerts we see for catalog.data.gov.

Rapid consumption of memory

Usually manifesting as a New Relic Host Unavailable alarm, the apache2 services (CKAN) consume more and more memory in a short amount of time until they eventually lock up and become unresponsive. This condition seems to affect multiple hosts at the same time.

Resolution

From the jumpbox, reload apache2 using Ansible across the web hosts

$ ansible -m service -a 'name=apache2 state=reloaded' -f 1 catalog-web-v1

For any individual failed hosts, use retry_ssh.sh to repeatedly retry the apache2 restart on the host. Run this in a tmux session to prevent disconnects.
```
$ ./retry_ssh.sh $host sudo service apache2 restart
```
Because the OOM killer might have killed some services in order to recover, reboot hosts as necessary.
```
$ ansible-playbook actions/reboot.yml --limit catalog-web-v1 -e '{"force_reboot": true}'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly