FMA: A Dataset For Music Analysis
Note that this is a beta release and that this repository as well as the paper and data are subject to change. Stay tuned!
The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads directed by WFMU, the longest-running freeform radio station in the United States [Wikipedia]. Please see the paper for a description of how the data was collected and cleaned as well as an analysis and some baselines.
You got various sizes of MP3-encoded audio data:
- fma_small.zip: 4,000 tracks of 30 seconds, 10 balanced genres (GTZAN-like) (~3.4 GiB)
- fma_medium.zip: 14,511 tracks of 30 seconds, 20 unbalanced genres (~12.2 GiB)
- [fma_large.zip]: 77,643 tracks of 30 seconds, 68 unbalanced genres (~90 GiB) (available soon)
- [fma_full.zip]: 77,643 untrimmed tracks, 164 unbalanced genres (~900 GiB) (subject to distribution constraints)
As meta-data, you got the following in this repository:
tracks.json: a table (to be imported as a pandas dataframe) which contains meta-data about each track such as the ID, the title, the artist or the genres. See the usage notebook for an exhaustive list.
genres.json: all the 164 available genres, used to infer the genre hierarchy and top-level genres.
features.json: common features extracted with librosa.
spotify.json: audio features provided by Spotify, formerly Echonest. Cover all tracks distributed in
fma_medium.zipas well as some others.
As a user of the dataset, you’re probably most interested by those notebooks:
- usage: how to load the datasets and develop, train and test your own models with it.
- webapi: query the web API of the FMA to update the dataset or gather further information about tracks, albums or artists.
If you’re curious you may check those notebooks, which most results appear in the paper:
For the most curious, these were used to create the dataset:
- creation: creation of the dataset, i.e.
- features: features extraction from the raw audio, i.e.
Download some data and verify its integrity.
echo "e731a5d56a5625f7b7f770923ee32922374e2cbf fma_small.zip" | sha1sum -c - echo "fe23d6f2a400821ed1271ded6bcd530b7a8ea551 fma_medium.zip" | sha1sum -c -
pyenv install 3.6.0 pyenv virtualenv 3.6.0 fma pyenv activate fma
Clone the repository.
git clone https://github.com/mdeff/fma.git cd fma
Fill in the configuration.
cat .env DATA_DIR=/path/to/fma_small
Open Jupyter or run a notebook.
jupyter-notebook make fma_baselines.ipynb
- 2016-12-06 beta release
License & co
- Please cite our paper if you use our code or data.
- The code in this repository is released under the terms of the MIT license.
- The meta-data, i.e. all the
.jsonfiles, is released under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
- We do not hold the copyright on the audio data, i.e. all
.ziparchives, and distribute it under the terms of the license chosen by the artist.
- The dataset is meant for research purposes.
- We are grateful to SWITCH and EPFL for hosting the dataset within the context of the SCALE-UP project, funded in part by the swissuniversities SUC P-2 program.