html2text python

Convert HTML to Markdown-formatted text.

4 years after


html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Why does this fork exist?

  • better build process
  • (less) disgusting code
  • maintaineable

If you use this software

Please take a moment to pay your respects to Aaron.


From within Python:

import html2text
print html2text.html2text("<p>Hello, world.</p>")

Or with some configuration options:

import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href=''>world</a>!")

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

Getting started (developers)

This project uses the pybuilder.

sudo pip install pyb_init
pyb-init github mriehl : html2text

Further building (includes coverage, pep8 linting, building a release) can be done with

source venv/bin/activate

Related Repositories



A RESTful service to convert HTML into Markdown-like text ...



Automatically add plain text parts into HTML emails sent by ActionMailer. ...

Top Contributors

aaronsw nushoin mriehl dreikanter stefanor stephenmcd brondsem adhiraj ap eevee abgoyal wking IanLewis blueyed dvj fmarier inklesspen laurentb maketolearn nene chitsaou


-   3.02 zip tar
-   3.01 zip tar