anemone Rubygems

Anemone web-spider framework

= Anemone

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See http://anemone.rubyforge.org for more information.

== Features * Multi-threaded design for high performance * Tracks 301 HTTP redirects * Built-in BFS algorithm for determining page depth * Allows exclusion of URLs based on regular expressions * Choose the links to follow on each page with focus_crawl() * HTTPS support * Records response time for each page * CLI program can list all pages in a domain, calculate page depths, and more * Obey robots.txt * In-memory or persistent storage of pages during crawl, using TokyoCabinet, MongoDB, or Redis

== Examples See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.

== Requirements * nokogiri * robots

== Development To test and develop this gem, additional requirements are: * rspec * fakeweb * tokyocabinet * mongo * redis

You will need to have {Tokyo Cabinet}[http://fallabs.com/tokyocabinet/], {MongoDB}[http://www.mongodb.org/], and {Redis}[http://code.google.com/p/redis/] installed on your system and running.

Related Repositories

anemone

anemone

Anemone web-spider framework ...

anemone

anemone

Anemone web-spider framework ...

anemone

anemone

Anemone web-spider framework ...

anemone

anemone

Anemone web-spider framework ...

anemone

anemone

Anemone web-spider framework ...


Top Contributors

chriskite tilsammans mislav rb2k nehhen jasonkim spk tansengming

Releases

-   v0.6.0 zip tar
-   v0.5.0 zip tar
-   v0.4.0 zip tar