worldfactbook-dataset 0,0

3 years after

World Factbook Corpus

The CIA World Factbook is a Public Domain data set comprising of geographical, economic and political data on every country in the world.

Data types include free text, currency, percentages, longitude & latitude, altitude, taxonomies, and as such it makes a viable test & demonstration corpus for search applications, on top of the intrinsic value of the data.

Since the Factbook is not available in an easily machine-readable format, we've created a crawler to extract the data in a way that should be easier to consume.

Implementation

The crawler was written using Node.js and outputs in both XML and JSON. Pre-generated output is provided.

Run the crawler

The command below will extract data from the dataset in ./factbook-crawler/data and export it to ./data

    node factbook-crawler/index.js

Use the data

var fs = require('fs'),
    path = require('path');

fs.readdirSync('./data/json').forEach(function(file){
    var country = JSON.parse(fs.readFileSync('./data/json/'+file));
    console.log( country.name )
});

Related Repositories

factbook

factbook

factbook gem - scripts for the world factbook (get open structured data e.g JSO ...


Top Contributors

richmarr bjarkih