ingestion travis-ci python

The DPLA ingestion system

The DPLA Ingestion System

Build Status

Build Status


Please see the release notes regarding changes and upgrade steps.

Setting up the ingestion server:

Install Python 2.7 if not already installed (;

Install PIP (;

Install the ingestion subsystem;

$ pip install --no-deps --ignore-installed -r requirements.txt

Configure an akara.ini file appropriately for your environment;

Port=<port for Akara to run on>
; Recommended LogLevel is one of DEBUG or INFO

ApiKey=<your Bing Maps API key>

Url=<URL to CouchDB instance, including trailing forward-slash>
Username=<CouchDB username>
Password=<CouchDB password>
SyncQAViews=<True or False; consider False on production>
; Recommended LogLevel is INFO for production; defaults to INFO if not set

Username=<Geonames username>
Token=<Geonames token>

Username=<Rackspace username>
ApiKey=<Rackspace API key>
DPLAContainer=<Rackspace container for bulk download data>
SitemapContainer=<Rackspace container for sitemap files>

NYPL=<Your NYPL API token>

SitemapURI=<Sitemap URI>
SitemapPath=<Path to local directory for sitemap files>

To=<Comma-separated email addresses to receive alert email>
From=<Email address to send alert email>


Merge the akara.conf.template and akara.ini file to create the akara.conf file;

$ python install 

Set up and start the Akara server;

$ akara -f akara.conf setup
$ akara -f akara.conf start

Build the database views;

$ python scripts/ dpla
$ python scripts/ dashboard
$ python scripts/ bulk_download

Testing the ingestion server:

You can test it with this set description from Clemson;

$ curl "http://localhost:8889/oai.listrecords.json?endpoint=" 

If you have the endpoint URL but not a set id, there’s a separate service for listing the sets;

$ curl "http://localhost:8889/oai.listsets.json?endpoint="

To run the ingest process run the script, if not done so already, initialize the database and database views, then feed it a source profile (found in the profiles directory);

$ python install
$ python scripts/ dpla
$ python scripts/ dashboard
$ python scripts/ profiles/clemson.pjs


This application is released under a AGPLv3 license.

  • Copyright Digital Public Library of America, 2015

Related Repositories



Universal data ingestion framework for Hadoop. ...



Spring XD makes it easy to solve common big data problems such as data ingestion and export, real-time analytics, and batch workflow orchestration ...



Data ingestion for Amazon Elasticsearch Service from S3 and Amazon Kinesis, using AWS Lambda: Sample code ...



Flume - Ingestion, an Apache Flume distribution ...



Plugin for Fluentd that sends logs to the Google Cloud Platform's log ingestion API. ...

Top Contributors

migbot markbreedlove anarchivist jlicht distobj no-reply moltude AudreyAltman mdellabitta


-   v33.8.7 zip tar
-   v33.8.6 zip tar
-   v33.8.5 zip tar
-   v33.8.4 zip tar
-   v33.8.3 zip tar
-   v33.8.2 zip tar
-   v33.8.1 zip tar
-   v33.8.0 zip tar
-   v33.7.3 zip tar
-   v33.7.2 zip tar
-   v33.7.1 zip tar
-   v33.7.0 zip tar
-   v33.6.0 zip tar
-   v33.5.2 zip tar
-   v33.5.1 zip tar
-   v33.5.0 zip tar
-   v33.4.0 zip tar
-   v33.3.0 zip tar
-   v33.2.3 zip tar
-   v33.2.2 zip tar
-   v33.2.1 zip tar
-   v33.2.0 zip tar
-   v33.1.6 zip tar
-   v33.1.5 zip tar
-   v33.1.4 zip tar
-   v33.1.3 zip tar
-   v33.1.2 zip tar
-   v33.1.1 zip tar
-   v33.1.0 zip tar
-   v33.0.2 zip tar