clean-names

Deduplicate and parse list of `dirty names'

Clean Names

Build Status Build status

The script takes a csv file with column ‘Name’ containing ‘dirty names’ — names with all different formats: lastname firstname, firstname lastname, middlename lastname firstname etc. (see sample input file). And it produces a csv file that has all the columns of the original csv file and the following columns: ‘uniqid’, ‘FirstName’, ‘MiddleInitial/Name’, ‘LastName’, ‘RomanNumeral’, ‘Title’, ‘Suffix’. The script takes out duplicate names by default (see sample output file).

Application

The script was used to fix names in CF-Scores from Database on Ideology, Money in Politics, and Elections. Processed database with clean names posted on Harvard DVN.

Installation

  1. Clone this repository

git clone https://github.com/soodoku/clean-names.git

  1. Navigate to clean-names

  2. Run python setup.py install

Using Clean Names

Usage: process_names.py [options]

Command Line Options

 	-h, 	    --help show this help message and exit  
 	-o OUTFILE, --out=OUTFILE  
                  	Output file in CSV (default: sample_output.csv)  
    -c COLUMN,  --column=COLUMN  
                  	Column name in CSV that contains Names (default: Name)    
    -a, 	    --all      	
    			Export all names (do not take duplicate names out)  (default: False)  

Example

 python process_names.py -a sample_input.csv 

License

Scripts are released under the MIT License

Related Repositories

clean-names

clean-names

Deduplicate and parse list of `dirty names' ...

country-codes

country-codes

:globe_with_meridians: A simple, clean CSV containing country calling codes, full country names and country ISO codes. ...

reshow

reshow

Clean TV show file names ...


Top Contributors

soodoku quantifiedcode-bot