The script takes a csv file with column ‘Name’ containing ‘dirty names’ — names with all different formats: lastname firstname, firstname lastname, middlename lastname firstname etc. (see sample input file). And it produces a csv file that has all the columns of the original csv file and the following columns: ‘uniqid’, ‘FirstName’, ‘MiddleInitial/Name’, ‘LastName’, ‘RomanNumeral’, ‘Title’, ‘Suffix’. The script takes out duplicate names by default (see sample output file).
The script was used to fix names in CF-Scores from Database on Ideology, Money in Politics, and Elections. Processed database with clean names posted on Harvard DVN.
- Clone this repository
Navigate to clean-names
python setup.py install
Using Clean Names
Command Line Options
-h, --help show this help message and exit -o OUTFILE, --out=OUTFILE Output file in CSV (default: sample_output.csv) -c COLUMN, --column=COLUMN Column name in CSV that contains Names (default: Name) -a, --all Export all names (do not take duplicate names out) (default: False)
python process_names.py -a sample_input.csv
Scripts are released under the MIT License