tldextract 0,1 travis-ci python

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.


Python Module PyPI version Build Status

tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and subdomains of a URL. For example, say you want just the ‘google’ part of ‘’.

Everybody gets this wrong. Splitting on the ‘.’ and taking the last 2 elements goes a long way only if you’re thinking of simple e.g. .com domains. Think parsing for example: the naive splitting method above will give you ‘co’ as the domain and ‘uk’ as the TLD, instead of ‘bbc’ and ‘’ respectively.

tldextract on the other hand knows what all gTLDs and ccTLDs look like by looking up the currently living ones according to the Public Suffix List. So, given a URL, it knows its subdomain from its domain, and its domain from its country code.

>>> import tldextract

>>> tldextract.extract('')
ExtractResult(subdomain='', domain='cnn', suffix='com')

>>> tldextract.extract('') # United Kingdom
ExtractResult(subdomain='forums', domain='bbc', suffix='')

>>> tldextract.extract('') # Kyrgyzstan
ExtractResult(subdomain='www', domain='worldbank', suffix='')

ExtractResult is a namedtuple, so it’s simple to access the parts you want.

>>> ext = tldextract.extract('')
>>> (ext.subdomain, ext.domain, ext.suffix)
('forums', 'bbc', '')
>>> # rejoin subdomain and domain
>>> '.'.join(ext[:2])
>>> # a common alias
>>> ext.registered_domain

Note subdomain and suffix are optional. Not all URL-like inputs have a subdomain or a valid suffix.

>>> tldextract.extract('')
ExtractResult(subdomain='', domain='google', suffix='com')

>>> tldextract.extract('google.notavalidsuffix')
ExtractResult(subdomain='google', domain='notavalidsuffix', suffix='')

>>> tldextract.extract('')
ExtractResult(subdomain='', domain='', suffix='')

If you want to rejoin the whole namedtuple, regardless of whether a subdomain or suffix were found:

>>> ext = tldextract.extract('')
>>> # this has unwanted dots
>>> '.'.join(ext)
>>> # join each part only if it's truthy
>>> '.'.join(part for part in ext if part)

This module started by implementing the chosen answer from this StackOverflow question on getting the “domain name” from a URL. However, the proposed regex solution doesn’t address many country codes like, or the exceptions to country codes like the registered domain The Public Suffix List does, and so does this module.


Latest release on PyPI:

pip install tldextract

Or the latest dev version:

pip install -e 'git://'

Command-line usage, splits the url components by space:

# forums bbc

Note About Caching

Beware when first running the module, it updates its TLD list with a live HTTP request. This updated TLD set is cached indefinitely in /path/to/tldextract/.tld_set.

(Arguably runtime bootstrapping like that shouldn’t be the default behavior, like for production systems. But I want you to have the latest TLDs, especially when I haven’t kept this code up to date.)

To avoid this fetch or control the cache’s location, use your own extract callable by setting TLDEXTRACT_CACHE environment variable or by setting the cache_file path in TLDExtract initialization.

# extract callable that falls back to the included TLD snapshot, no live HTTP fetching
no_fetch_extract = tldextract.TLDExtract(suffix_list_urls=None)

# extract callable that reads/writes the updated TLD set to a different path
custom_cache_extract = tldextract.TLDExtract(cache_file='/path/to/your/cache/file')

# extract callable that doesn't use caching
no_cache_extract = tldextract.TLDExtract(cache_file=False)

If you want to stay fresh with the TLD definitions–though they don’t change often–delete the cache file occasionally, or run

tldextract --update


env TLDEXTRACT_CACHE="~/tldextract.cache" tldextract --update

It is also recommended to delete the file after upgrading this lib.

Advanced Usage

Specifying your own URL or file for the Suffix List data

You can specify your own input data in place of the default Mozilla Public Suffix List:

extract = tldextract.TLDExtract(
    # Recommended: Specify your own cache file, to minimize ambiguities about where
    # tldextract is getting its data, or cached data, from.

The above snippet will fetch from the URL you specified, upon first need to download the suffix list (i.e. if the cache_file doesn’t exist).

If you want to use input data from your local filesystem, just use the file:// protocol:

extract = tldextract.TLDExtract(

Use an absolute path when specifying the suffix_list_urls keyword argument. os.path is your friend.


If I pass an invalid URL, I still get a result, no error. What gives?

To keep tldextract light in LoC & overhead, and because there are plenty of URL validators out there, this library is very lenient with input. If valid URLs are important to you, validate them before calling tldextract.

This lenient stance lowers the learning curve of using the library, at the cost of desensitizing users to the nuances of URLs. Who knows how much. But in the future, I would consider an overhaul. For example, users could opt into validation, either receiving exceptions or error metadata on results.

Public API

I know it’s just one method, but I’ve needed this functionality in a few projects and programming languages, so I’ve uploaded tldextract to App Engine. It’s there on GAE’s free pricing plan until Google cuts it off. Just hit it with your favorite HTTP client with the URL you want parsed like so:

curl ""
# {"domain": "bbc", "subdomain": "www", "suffix": ""}


Setting up

  1. git clone this repository.
  2. Change into the new directory.
  3. pip install tox

Alternatively you can install detox instead of tox to run tests in parallel.

Running the Test Suite

Run all tests against all supported Python versions:


Run all tests against a specific Python environment configuration:

tox -l
tox -e py35-requests-2.9.1

Related Repositories



Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List. ...



Library for extraction of domain parts e.g. TLD. Domain parser that uses Public Suffix List. ...



Extract domain, subdomain and tld from a url ...



Extract root domain, subdomain name, and tld from a url, using the Public Suffix List. ...



Extract the top level domain and subdomain from a host name. ...

Top Contributors

john-kurkowski medecau llonchj msabramo evanv dfeinzeig jnozsc maxmzkr mauricioabreu kristjanr xdanx TylerLubeck hangtwenty catalanojuan EdwardBetts arski shirk3y otakucode pmlandwehr oberstet TomAnthony tstriker mvasilkov rbaier


-   2.0.1 zip tar
-   2.0.0 zip tar
-   2.0rc1 zip tar
-   1.7.5 zip tar
-   1.7.4 zip tar
-   1.7.3 zip tar
-   1.7.2 zip tar
-   1.7.1 zip tar
-   1.7 zip tar
-   1.6 zip tar
-   1.5.1 zip tar
-   1.5 zip tar
-   1.4 zip tar
-   1.3.1 zip tar
-   1.3 zip tar
-   1.2.2 zip tar
-   1.2.1 zip tar
-   1.2 zip tar
-   1.1.3 zip tar
-   1.1.1 zip tar
-   1.1 zip tar
-   1.0 zip tar
-   0.4 zip tar
-   0.3.2 zip tar
-   0.3.1 zip tar
-   0.3 zip tar
-   0.2 zip tar
-   0.1.1 zip tar
-   0.1 zip tar