html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Why does this fork exist?
- better build process
- (less) disgusting code
If you use this software
Please take a moment to pay your respects to Aaron.
From within Python:
import html2text print html2text.html2text("<p>Hello, world.</p>")
Or with some configuration options:
import html2text h = html2text.HTML2Text() h.ignore_links = True print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Originally written by Aaron Swartz. This code is distributed under the GPLv3.
Getting started (developers)
This project uses the pybuilder.
sudo pip install pyb_init pyb-init github mriehl : html2text
Further building (includes coverage, pep8 linting, building a release) can be done with
source venv/bin/activate pyb