What is this?
This project contains scripts to automatically extract html financial statements with Selenium and parse them with PyQuery. The parsed transactions are stored in a database, and can be browsed and searched through a Django application.
Currently, there are parsers that understand html financial statements from these institutes:
- American Express www.americanexpress.se
- Swedbank www.swedbank.se
- SEB www.seb.se
- Minpension www.minpension.se
You need to have a valid username and password for each bank that you want to use with this application.
In order to get started with this project, you need
Run the application locally
With that installed, you should be able to run the following in a terminal
git clone [email protected]:captnswing/banking.git cd banking docker-compose up -d
After all is done, you should be able to see the started containers
$ docker-compose ps Name Command State Ports --------------------------------------------------------------------------------------------------------- banking_db_1 /docker-entrypoint.sh postgres Up 0.0.0.0:49171->5432/tcp banking_es_1 /elasticsearch/bin/elastic ... Up 0.0.0.0:49172->9200/tcp, 0.0.0.0:49173->9300/tcp banking_web_1 python manage.py runserver ... Up 0.0.0.0:49174->8000/tcp
Now, initialize the database
docker-compose run web python ./banking/manage.py migrate --noinput --syncdb
You should now be able to open the views at http://localhost:49174/statements (or
open http://$(docker-machine ip docker 2>/dev/null):49174/statements/).
They will not show any data yet. For that, you need to run the scripts below.
Note: the port will be different, check the output of
Collect and ingest the data
To collect statements from the banking sites, invoke
docker-compose run web python ./bin/collect_statements.py --bankname=<bankname> --username=<login> --password=<pwd>
for each combination of
docker-compose run web python ./bin/collect_statements.py --help
To view the supported banknames. The collected .html files will be stored in the folder specified in
BANKING_OFFLINE_DATADIR (the default is
To parse the collected html files and save all transactions into the database, run
docker-compose run web python ./bin/process_statements.py webport=`docker-compose port web 8000 | sed -E 's|.*:(.*)|\1|g'` dockerhost=`docker-machine ip docker 2>/dev/null` open http://$dockerhost:$webport
Now the views at http://localhost:49174/statements (or
open http://$(docker-machine ip docker 2>/dev/null):49174/statements/) should show you the parsed transactions.
Create or update the index
In order to be able to search, you need to create the search index
docker-compose run web python banking/manage.py rebuild_index --noinput
docker-compose run web python banking/manage.py update_index --age=2
lets you update the index with transactions that have been modified or added in the last (in this example) 2 hours.
Settings are condocker-composeured in
banking/settings.py. Decide e.g. where to save the extracted .html files by setting the name of the folder in
BANKING_OFFLINE_DATADIR. The default is set to
Use the Django admin
In order to be able to use the Django admin interface (to e.g. give your bankaccounts a name), you need to create a superuser. Load the superuser from a fixture:
docker-compose run web python ./banking/manage.py loaddata admin_user
Then, access the admin interface http://localhost:49174/statements (or
open http://$(docker-machine ip docker 2>/dev/null):49174/statements/) using
Tips for debugging the parsers
Sometimes, the html of the statement pages changes. I found selectorgadget useful in finding the right CSS selector expression.