.. pbh5tools documentation master file, created by
   sphinx-quickstart on Thu Nov 10 17:09:22 2011.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

=========
pbh5tools
=========

``pbh5tools`` is a collection of tools that can manipulate the content or extract data from 
two types of h5 files:

* ``cmp.h5``: files that contain alignment information.
* ``bas.h5`` and ``pls.h5``: files that contain base-call information.

``pbh5tools`` is comprised of two executables: ``cmph5tools.py`` and
``bash5tools.py``. At the moment, the ``cmph5tools.py`` program
provides a rich set of tools to manipulate and analyze the data in a
``cmp.h5`` file. The ``bash5tools.py`` provides mechanisms to extract
basecall information from bas.h5 files.


############
Installation
############

To install ``pbh5tools``, run the following command from the ``pbh5tools`` root directory: ::

   python setup.py install

####################
Tool: bash5tools.py
####################

``bash5tools.py`` can extract read sequences and quality values for
both Raw and circular consensus sequencing (CCS) readtypes and use
create ``fastq`` and ``fasta`` files.


-----
Usage
-----
::

    usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug]
                         [--outFilePrefix OUTFILEPREFIX]
                         [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE]
                         [--minLength MINLENGTH] [--minReadScore MINREADSCORE]
                         [--minPasses MINPASSES]
                         input.bas.h5

    Tool for extracting data from .bas.h5 files

    positional arguments:
      input.bas.h5          input .bas.h5 filename

    optional arguments:
      -h, --help            show this help message and exit
      --verbose, -v         Set the verbosity level (default: None)
      --version             show program's version number and exit
      --profile             Print runtime profile at exit (default: False)
      --debug               Run within a debugger session (default: False)
      --outFilePrefix OUTFILEPREFIX
                            output filename prefix [None]
      --readType {ccs,subreads,unrolled}
                            read type (ccs, subreads, or unrolled) []
      --outType OUTTYPE     output file type (fasta, fastq) [fasta]

    Read filtering arguments:
      --minLength MINLENGTH
                            min read length [0]
      --minReadScore MINREADSCORE
                            min read score, valid only with
                            --readType={unrolled,subreads} [0]
      --minPasses MINPASSES
                            min number of CCS passes, valid only with
                            --readType=ccs [0]

--------
Examples
--------

Extracting all Raw reads from ``input.bas.h5`` without any filtering
and exporting to FASTA (``myreads.fasta``): ::

    python bash5tools.py input.bas.h5 --outFilePrefix myreads --outType fasta --readType Raw

Extracting all CCS reads from ``input.bas.h5`` that have read lengths
larger than 100 and exporting to FASTQ (``myreads.fastq``): ::

    python bash5tools.py --inFile input.bas.h5 --outFilePref myreads --outType fastq --readType CCS --minLength 100


####################
Tool: cmph5tools.py
####################

``cmph5tools.py`` is a multi-commandline tool that provides access to
the following subtools:

1. **merge**: Merge multiple ``cmp.h5`` files into a single file.

2. **sort**: Sort a ``cmp.h5`` file.

3. **select**: Create a new file from a ``cmp.h5`` file by specifying
which reads to include.

4. **equal**: Compare the contents of 2 ``cmp.h5`` files for
equivalence.

5. **summarize**: Summarize the contents of a ``cmp.h5`` file in a
verbose, human readable format.

6. **stats**: Extract summary metrics from a ``cmp.h5`` file into a
``csv`` file.

7. **valid**: Determine whether a ``cmp.h5`` file is valid.

8. **listMetrics**: Emit the available metrics and statistics for use
in the ``select`` and ``stats`` subcommands.

To list all available subtools provided by ``cmph5tools.py`` simply
run: ::

    cmph5tools.py --help

Each subtool has its own usage information which can be generated by
running: ::

    cmph5tools.py <toolname> --help

To run any subtool it is suggested to use the ``--info`` commandline
argument since this will provide progress information while the script
is running via printing in stdout: ::

    cmph5tools.py <toolname> --info <other arguments>

More examples are available in the examples.t file.

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

