cgatools version 1.0.0
usage: cgatools COMMAND [ options ] [ positionalArgs ]
For help on a particular command CMD, try "cgatools help CMD".
Available commands:
help Prints help information.
man Prints the cgatools reference manual.
fasta2crr Converts fasta reference files to the crr format.
crr2fasta Converts a crr reference file to the fasta format.
listcrr Lists chromosomes, contigs, or ambiguous sequences of a crr
file.
decodecrr Prints the reference sequence for a given reference range.
snpdiff Compares snp calls to a Complete Genomics variant file.
calldiff Compares two Complete Genomics variant files.
map2sam Converts CGI initial reference mappings into SAM format.
evidence2sam Converts CGI variant evidence data into SAM format.
-------------------------------------------------------------------------------
COMMAND NAME
help - Prints help information.
OPTIONS
-h [ --help ]
Print this help message.
--command arg
The command to describe.
--format arg (=text)
The format of the output stream (text or html).
--output arg (=STDOUT)
The output file (may be omitted for stdout).
-------------------------------------------------------------------------------
COMMAND NAME
man - Prints the cgatools reference manual.
OPTIONS
-h [ --help ]
Print this help message.
--output arg (=STDOUT)
The output file (may be omitted for stdout).
--format arg (=text)
The format of the output stream (text or html).
-------------------------------------------------------------------------------
COMMAND NAME
fasta2crr - Converts fasta reference files to the crr format.
OPTIONS
-h [ --help ]
Print this help message.
--input arg
The input fasta files (may be positional args, or omitted for stdin).
Take care to specify the fasta files in chromosome order; ordering is
important. To work with human Complete Genomics data, the chromosome
order should be chr1...chr22, chrX, chrY, chrM.
--output arg
The output crr file.
--circular arg
A comma-separated list of circular chromosome names. If ommitted,
defaults to chrM.
-------------------------------------------------------------------------------
COMMAND NAME
crr2fasta - Converts a crr reference file to the fasta format.
OPTIONS
-h [ --help ]
Print this help message.
--input arg
The input crr file (may be positional arg).
--output arg (=STDOUT)
The output fasta file (may be omitted for stdout).
--line-width arg (=50)
The maximum width of a line of sequence.
-------------------------------------------------------------------------------
COMMAND NAME
listcrr - Lists chromosomes, contigs, or ambiguous sequences of a crr file.
DESCRIPTION
For mode=chromosome, prints a space-separated table describing each
chromosome within the reference. The columns are defined as follows:
ChromosomeId A numeric identifier for the chromosome.
Chromosome The name of the chromosome.
Length The length in bases of the chromosome.
Circular Boolean indicating if the chromosome is circular.
Md5 Md5 of the string containing the upper case IUPAC code for
each base in the chromosome (spaces and dashes are
omitted).
For mode=contig, prints a space-separated table describing each gap and
each contig within the reference. Here, a gap between contigs is defined as
any stretch of min-contig-gap-length or more no-called reference bases (N
character). The columns are defined as follows:
ChromosomeId A numeric identifier for the chromosome.
Chromosome The name of the chromosome.
Type Either CONTIG or GAP.
Offset The 0-based offset of the start of the contig or gap
within the chromosome.
Length The length in bases of the contig or gap.
For mode=ambiguity, prints a space-separated table describing each run of
ambiguity codes within the reference. The columns are defined as follows:
ChromosomeId A numeric identifier for the chromosome.
Chromosome The name of the chromosome.
Code The IUPAC code for the region.
Offset The 0-based offset of the run of ambiguity codes in the
chromosome.
Length The length in bases of the run of ambiguity codes.
OPTIONS
-h [ --help ]
Print this help message.
--reference arg
The reference crr file (may be positional arg).
--output arg (=STDOUT)
The output file (may be omitted for stdout).
--mode arg (=chromosome)
One of chromosome, contig, or ambiguity.
--min-contig-gap-length arg (=50)
Minimum length of gap between reference contigs, for mode=contig.
-------------------------------------------------------------------------------
COMMAND NAME
decodecrr - Prints the reference sequence for a given reference range.
OPTIONS
-h [ --help ]
Print this help message.
--reference arg
The reference crr file (may be positional arg).
--output arg (=STDOUT)
The output file (may be omitted for stdout).
--range arg
The range of bases to print (chr,begin,end or chr:begin-end).
-------------------------------------------------------------------------------
COMMAND NAME
snpdiff - Compares snp calls to a Complete Genomics variant file.
DESCRIPTION
Compares the snp calls in the "genotypes" file to the calls in a Complete
Genomics variant file. The genotypes file is a tab-delimited file with at
least the following columns (additional columns may be given):
Chromosome (Required) The name of the chromosome.
Offset0Based (Required) The 0-based offset in the chromosome.
GenotypesStrand (Optional) The strand of the calls in the Genotypes
column (+ or -, defaults to +).
Genotypes (Optional) The calls, one per allele. The following
calls are recognized:
A,C,G,T A called base.
N A no-call.
- A deleted base.
. A non-snp variation.
The output is a tab-delimited file consisting of the columns of the
original genotypes file, plus the following additional columns:
Reference The reference base at the given position.
VariantFile The calls made by the variant file, one per allele.
The character codes are the same as is described for
the Genotypes column.
DiscordantAlleles (Only if Genotypes is present) The number of
Genotypes alleles that are discordant with calls in
the VariantFile. If the VariantFile is described as
haploid at the given position but the Genotypes is
diploid, then each genotype allele is compared
against the haploid call of the VariantFile.
NoCallAlleles (Only if Genotypes is present) The number of
Genotypes alleles that were no-called by the
VariantFile. If the VariantFile is described as
haploid at the given position but the Genotypes is
diploid, then a VariantFile no-call is counted twice.
The verbose output is a tab-delimited file consisting of the columns of the
original genotypes file, plus the following additional columns:
Reference The reference base at the given position.
VariantFile The call made by the variant file for one allele (there is
a line in this file for each allele). The character codes
are the same as is described for the Genotypes column.
[CALLS] The rest of the columns are pasted in from the VariantFile,
describing the variant file line used to make the call.
The stats output is a comma-separated file with several tables describing
the results of the snp comparison, for each diploid genotype. The tables
all describe the comparison result (column headers) versus the genotype
classification (row labels) in different ways. The "Locus classification"
tables have the most detailed match classifications, while the "Locus
concordance" tables roll these match classifications up into "discordance"
and "no-call". A locus is considered discordant if it is discordant for
either allele. A locus is considered no-call if it is concordant for both
alleles but has a no-call on either allele. The "Allele concordance"
describes the comparison result on a per-allele basis.
OPTIONS
-h [ --help ]
Print this help message.
--reference arg
The input crr file.
--variants arg
The input variant file.
--genotypes arg
The input genotypes file.
--output arg
The output genotypes file.
--verbose arg
The verbose output file.
--stats arg
The stats output file.
SUPPORTED FORMAT_VERSION
0.3 or later
-------------------------------------------------------------------------------
COMMAND NAME
calldiff - Compares two Complete Genomics variant files.
DESCRIPTION
Compares two Complete Genomics variant files. Divides the genome up into
superloci of nearby variants, then compares the superloci. Also refines the
comparison to determine per-call or per-locus comparison results.
Comparison results are usually described by a semi-colon separated string,
one per allele. Each allele's comparison result is one of the following
classifications:
ref-identical The alleles of the two variant files are identical, and
they are consistent with the reference.
alt-identical The alleles of the two variant files are identical, and
they are inconsistent with the reference.
ref-consistent The alleles of the two variant files are consistent,
and they are consistent with the reference.
alt-consistent The alleles of the two variant files are consistent,
and they are inconsistent with the reference.
onlyA The alleles of the two variant files are inconsistent,
and only file A is inconsistent with the reference.
onlyB The alleles of the two variant files are inconsistent,
and only file B is inconsistent with the reference.
mismatch The alleles of the two variant files are inconsistent,
and they are both inconsistent with the reference.
phase-mismatch The two variant files would be consistent if the
hapLink field had been empty, but they are
inconsistent.
ploidy-mismatch The superlocus did not have uniform ploidy.
In some contexts, this classification is rolled up into a simplified
classification, which is one of "identical", "consistent", "onlyA",
"onlyB", or "mismatch".
A good place to start looking at the results is the superlocus-output file.
It has columns defined as follows:
SuperlocusId An identifier given to the superlocus.
Chromosome The name of the chromosome.
Begin The 0-based offset of the start of the superlocus.
End The 0-based offset of the base one past the end of the
superlocus.
Classification The match classification of the superlocus.
Reference The reference sequence.
AllelesA A semicolon-separated list of the alleles (one per
haplotype) for variant file A, for the phasing with the
best comparison result.
AllelesB A semicolon-separated list of the alleles (one per
haplotype) for variant file B, for the phasing with the
best comparison result.
The locus-output file contains, for each locus in file A and file B that is
not consistent with the reference, an annotated set of calls for the locus.
The calls are annotated with the following columns:
SuperlocusId The id of the superlocus containing the locus.
File The variant file (A or B).
LocusClassification The locus classification is determined by the
varType column of the call that is inconsistent
with the reference, concatenated with a
modifier that describes whether the locus is
heterozygous, homozygous, or contains no-calls.
If there is no one variant in the locus (i.e.,
it is heterozygous alt-alt), the locus
classification begins with "other".
LocusDiffClassification The match classification for the locus. This is
defined to be the best of the comparison of the
locus to the same region in the other file, or
the comparison of the superlocus.
Superlocus comparison statistics can be found in the superlocus-stats file.
Locus comparison statistics can be found in the locus-stats file. Beware
any output files whose parameter name begins with "debug".
OPTIONS
-h [ --help ]
Print this help message.
--reference arg
The input crr file.
--variantsA arg
The "A" input variant file.
--variantsB arg
The "B" input variant file.
--superlocus-output arg
The output file for superlocus classification.
--superlocus-stats arg
The output file for superlocus classification stats.
--locus-output arg
The output file for locus classification.
--locus-stats arg
The output file for locus classification stats.
--locus-stats-column-count arg (=15)
The number of columns for locus compare classification in the locus
stats file.
--debug-call-output arg
The output file for call classification.
--debug-superlocus-output arg
The output file for superlocus informationcall classification.
--max-hypothesis-count arg (=32)
The maximum number of possible phasings to consider for a superlocus.
--no-reference-cover-validation
Turns off validation that all bases of a chromosome are covered by
calls of the variant file.
SUPPORTED FORMAT_VERSION
0.3 or later
-------------------------------------------------------------------------------
COMMAND NAME
map2sam - Converts CGI initial reference mappings into SAM format.
DESCRIPTION
The Map2Sam converter takes as input Reads and Mappings files, a library
structure file and a crr reference file and generates one SAM file as an
output. The output is sent into stdout by default. All the mapping records
from the input are converted into corresponding SAM records one to one. In
addition, the unmapped DNB records are reported as SAM records having
appropriate indication. Map2Sam converter tries to identify primary
mappings and highlight them using the appropriate flag. The negative gaps
in CGI mappings are represented using GS/GQ/GC tags.
OPTIONS
-h [ --help ]
Print this help message.
-r [ --reads ] arg
Input reads file.
-m [ --mappings ] arg
Input mappings file.
-l [ --library ] arg
Input library file.
-s [ --reference ] arg
Reference file.
-o [ --output ] arg (=STDOUT)
The output SAM file (may be omitted for stdout).
-f [ --from ] arg (=0)
Defines start read record of the export range.
-t [ --to ] arg (=18446744073709551615)
Defines end read record of the export range (the end record is not
exported).
-e [ --export-region ] arg
defines an export region as a half-open interval 'chr,from,to'
--skip-not-mapped
Skip not mapped records
--add-mate-sequence
Generate mate sequence and score tags.
--mate-sv-candidates
Inconsistent mappings are normally converted as single arm mappings
with no mate information provided. If the option is used map2sam will
mate unique single arm mappings in SAM including those on different
stands and chromosomes. To distinguish these "artificially" mated
records a tag "XS:i:1" is used. The MAPQ provided for these records is
a single arm mapping weight.
SUPPORTED FORMAT_VERSION
0.3 or later
-------------------------------------------------------------------------------
COMMAND NAME
evidence2sam - Converts CGI variant evidence data into SAM format.
DESCRIPTION
The evidence2sam converter takes as input evidence mapping files
(evidenceDnbs-*) and generates one SAM file as an output. The output is
sent into stdout by default. All the evidence mapping records from the
input are converted into a pair of corresponding SAM records - one record
for each HalfDNB. Evidence2Sam converter reports all mappings as not
primary. The negative gaps in CGI mappings are represented using GS/GQ/GC
tags.
OPTIONS
-h [ --help ]
Print this help message.
--beta
This is a beta command. To run this command, you must pass the --beta
flag.
-e [ --evidence-dnbs ] arg
Input evidence dnbs file.
-s [ --reference ] arg
Reference file.
-o [ --output ] arg (=STDOUT)
The output SAM file (may be omitted for stdout).
-r [ --export-region ] arg
defines an export region as a half-open interval 'chr,from,to'.
--keep-duplicates
Keep local duplicates of DNB mappings.All the output SAM records will
be marked as not primary if this option is used.
--add-mate-sequence
Generate mate sequence and score tags.
--add-allele-id
Generate interval id and allele id tags.
-v [ --debug-output ]
Generate verbose debug output. Please don't rely on this option in
production.
SUPPORTED FORMAT_VERSION
0.3 or later