FMA: A Dataset For Music Analysis

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.

Note that this is a beta release and that this repository as well as the paper and data are subject to change. Stay tuned!


The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads directed by WFMU, the longest-running freeform radio station in the United States [Wikipedia]. Please see the paper for a description of how the data was collected and cleaned as well as an analysis and some baselines.

You got various sizes of MP3-encoded audio data:

  1. 4,000 tracks of 30 seconds, 10 balanced genres (GTZAN-like) (~3.4 GiB)
  2. 14,511 tracks of 30 seconds, 20 unbalanced genres (~12.2 GiB)
  3. []: 77,643 tracks of 30 seconds, 68 unbalanced genres (~90 GiB) (available soon)
  4. []: 77,643 untrimmed tracks, 164 unbalanced genres (~900 GiB) (subject to distribution constraints)

As meta-data, you got the following in this repository:

  • tracks.json: a table (to be imported as a pandas dataframe) which contains meta-data about each track such as the ID, the title, the artist or the genres. See the usage notebook for an exhaustive list.
  • genres.json: all the 164 available genres, used to infer the genre hierarchy and top-level genres.
  • features.json: common features extracted with librosa.
  • spotify.json: audio features provided by Spotify, formerly Echonest. Cover all tracks distributed in and as well as some others.


As a user of the dataset, you’re probably most interested by those notebooks:

  1. usage: how to load the datasets and develop, train and test your own models with it.
  2. webapi: query the web API of the FMA to update the dataset or gather further information about tracks, albums or artists.

If you’re curious you may check those notebooks, which most results appear in the paper:

  1. analysis: some exploration of the data.
  2. baselines: baseline models for genre recognition.

For the most curious, these were used to create the dataset:

  1. creation: creation of the dataset, i.e. tracks.json and genres.json.
  2. features: features extraction from the raw audio, i.e. features.json.


  1. Download some data and verify its integrity.

    echo "e731a5d56a5625f7b7f770923ee32922374e2cbf" | sha1sum -c -
    echo "fe23d6f2a400821ed1271ded6bcd530b7a8ea551" | sha1sum -c -
  2. Optionally, use pyenv to install Python 3.6 and create a virtual environment.

    pyenv install 3.6.0
    pyenv virtualenv 3.6.0 fma
    pyenv activate fma
  3. Clone the repository.

    git clone
    cd fma
  4. Install the Python dependencies from requirements.txt. Depending on your usage, you may need to install ffmpeg or graphviz.

    make install
  5. Optionnaly, install CUDA to train neural networks on GPUs. See Tensorflow’s instructions.

  6. Fill in the configuration.

    cat .env
  7. Open Jupyter or run a notebook.

    make fma_baselines.ipynb


  • 2016-12-06 beta release
    • paper: arXiv:1612.01840v1
    • code: git tag beta
    • sha1: e731a5d56a5625f7b7f770923ee32922374e2cbf
    • sha1: fe23d6f2a400821ed1271ded6bcd530b7a8ea551

License & co

  • Please cite our paper if you use our code or data.
  • The code in this repository is released under the terms of the MIT license.
  • The meta-data, i.e. all the .json files, is released under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
  • We do not hold the copyright on the audio data, i.e. all .mp3 in the .zip archives, and distribute it under the terms of the license chosen by the artist.
  • The dataset is meant for research purposes.
  • We are grateful to SWITCH and EPFL for hosting the dataset within the context of the SCALE-UP project, funded in part by the swissuniversities SUC P-2 program.

Related Repositories



FMA: A Dataset For Music Analysis ...



Amazon S3 File Manager API in Python. S3.FMA is a thin wrapper around boto to perform specific high level file management tasks on an AWS S3 Bucket. ...



FMA Radio Streamer (Boston Music Hack Day 2011) ...



A collection of scripts to assist in scraping the FreeMusicArchive. ...



Toy FMA repo for example queries. ...