html2text python

Convert HTML to Markdown-formatted text.

4 years after

html2text

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Why does this fork exist?

  • better build process
  • (less) disgusting code
  • maintaineable

If you use this software

Please take a moment to pay your respects to Aaron.

Usage

From within Python:

import html2text
print html2text.html2text("<p>Hello, world.</p>")

Or with some configuration options:

import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

Getting started (developers)

This project uses the pybuilder.

sudo pip install pyb_init
pyb-init github mriehl : html2text

Further building (includes coverage, pep8 linting, building a release) can be done with

source venv/bin/activate
pyb

Related Repositories

html2text-service

html2text-service

A RESTful service to convert HTML into Markdown-like text ...

actionmailer-html2text

actionmailer-html2text

Automatically add plain text parts into HTML emails sent by ActionMailer. ...


Top Contributors

aaronsw nushoin mriehl dreikanter stefanor stephenmcd brondsem adhiraj ap eevee abgoyal wking IanLewis blueyed dvj fmarier inklesspen laurentb maketolearn nene chitsaou

Releases

-   3.02 zip tar
-   3.01 zip tar