ipmt

Indexed Pattern Matching Tool is a mechanism for indexing, storage and search patterns in a text file.

ipmt

Indexed Pattern Matching Tool is a mechanism for indexing, storage and search patterns in a text file.

HOW TO USE Open your favorite console application and navigate to the project folder. Run make to compile. By default, the GCC toolchain will be used. If you prefer to use the LLVM compiler, run make CXX=clang++. After having compiled, the executable will be available inside the bin folder. Run ./bin/ipmt --help for help.

SYNOPSIS $ ./bin/ipmt [OPTIONS] index FILE $ ./bin/ipmt [OPTIONS] search PATTERN INDEX_FILE

DESCRIPTION The ipmt utility was built for off-line search of text files. To do that, one must first create a compressed indexed file from the input file. From that, the tool can search one or more patterns in logarithmic time. For more than one pattern, please use a pattern file. Each line where a match of one of the patterns is found is printed to the standard output.

The algorithm used for the indexing is the Suffix Arrays, from authors Udi
Manber and Gene Meyers (1989). The algorithms LZW, of Terry Welch (1984),
LZ77, of Abraham Lempel and Jacop Ziv (1977), and HUFFMAN, from David
Huffman (1952), are available for compression.

OPTIONS Generic Program Information -h, --help Display a help menu with all options.

    -v, --version
        Display version information and exit.

Matcher Selection
    -p, --pattern PATTERN_FILE
        Extract search patterns from PATTERN_FILE, separated by newlines.

Output Control
    -c, --count
        Only a count of selected lines is written to standard output.

Compression Algorithm Selection
    --compression=ALGORITHM
        Specify algorithm to be used for compression. The possible values
        are HUFFMAN (default), LZW and LZ77.

EXAMPLES To index a file for later search: $ ./bin/ipmt index [path to text file]

To print all occurrences of a pattern in a file:
    $ ./bin/ipmt search [pattern] [path to index file]

If you want to discover the number of occurences of a pattern in a file:
    $ ./bin/ipmt -c search [pattern] [path to index file]

To use the LZ77 algorithm to index and search file:
    $ ./bin/ipmt --compression=LZ77 index [path to text file]
    $ ./bin/ipmt --compression=LZ77 search [pattern] [path to index file]

DOCUMENTATION For more details about specification, please read specification.pdf inside the doc folder. The report of this project can be found in the doc folder as well.

NOTES ipmt was built by CIn/UFPE students Miguel Araújo and Paulo Lieuthier as an assignment for the String Processing course of 2015.2 (Processamento de Cadeias de Caracteres in portuguese), professor Paulo Gustavo.

LICENSE See LICENSE file

Related Repositories

exonio

exonio

Bring some useful HP-12C and Excel financial formulas to Ruby ;) ...