

Vhybridize Manual
=================

Be sure to check `README` and `formats`, this document only covers
what isn't documented in those files.


Advanced Virtual Hybridization
------------------------------

The typical usage is to first compute the order of the probes on a
few genomes:

  vhybridze -p probes.fas some-genome.fas other-genome.fas

This package include some programs to convert from one format to the
other and to extract interesting subsets (eg. permutation) from a
complete dataset.  Those start with "vhy" are all (almost) fully
documented with `--help`.

Example:

  vhybridize --help
  vhy-filter --help
  vhy-to-vhx --help

If you are running bash you can get the complete list of supplied
commands with

  vhy[TAB][TAB]

where `[TAB]` means that you hit the TAB key.

When your target genome is large you can speedup the search by using
blastall instead of bl2seq:

  formatdb -i some-genome.fas -p F
  formatdb -i other-genome.fas -p F
  vhybridze -a -p probes.fas some-genome.fas other-genome.fas

Note that using blastall decrease sensitivity and you will typically
miss many probe locations.  

When a target genome is really large, it is possible to split the job
and run it on multiple computers.  Of course when we say that is is
possible, we mean that you do most of the work, we only provide
minimal support for this.  Typically you split the probe set, launch
the computation with a subset per computer then merge the result:

  vhy-split-multi-fasta all-probes.fas probes
  cat probes/[a-d]*.fas > probes-1.fas
  cat probes/[e-l]*.fas > probes-2.fas
  cat probes/[m-z]*.fas > probes-3.fas

  vhybridze -p probes-1.fas -o o1 spc-1.fas spc-2.fas # on computer 1
  vhybridze -p probes-2.fas -o o2 spc-1.fas spc-2.fas # on computer 2
  vhybridze -p probes-3.fas -o o3 spc-1.fas spc-2.fas # on computer 3

  vhy-merge o1/spc-1.vhy o2/spc-1.vhy o3/spc-1.vhy > spc-1.vhy
  vhy-merge o1/spc-2.vhy o2/spc-2.vhy o3/spc-2.vhy > spc-2.vhy



Manipulations of the output files
---------------------------------

Most manipulation is performed on the `.vhy` files since the `.vhx`
format lacks useful meta information.  It is possible to filter `.vhy`
files to keep only hits displaying a certain "quality".  As an
example, if you have computed the probe locations with 

  %id = %lenght = 70

on a large data set.  You don't need perform the computation again to
have the probe locations with

  %id = %lenght = 80

You only need to filter out the low quality hits:

  vhy-filter -i 80 -l 80 some-genome.vhy > hi-qual.vhy
