Software released from the CRD group at the Broad Institute is built and tested on a modern version of Linux for the x86_64 architecture. Our software does not run on 32-bit machines: you must have a 64-bit Linux system. Our users have successfully built and executed our software using a variety of Linux distributions including Ubuntu, RedHat, and SUSE. We expect that any flavor of x86_64 Linux will work fine, as long as it provides the necessary software prerequisites, as described below.
We rely on reasonably up-to-date versions of these software packages:
If this goes well, you‘re ready to go. Consult the manual for the package to learn how to set up your data and what programs to execute.
Sequencing data requirements summary
● Illumina MiSeq or HiSeq 2500 genome sequencers
● PCR-free library preparation
● 250 base paired end reads (or longer)
● ~450 base pair fragment size
● ~60x coverage
Input files
DISCOVAR requires a BAM file containing the raw reads from the sequencer. For variant calling it also
requires a matching reference FASTA file.
call variant 命令:
DISCOVAR can currently generate variants for small regions, and not the entire genome at once. To
generate variants for a 100 kb region for example, use:
Discovar \
READS=reads.bam \
OUT_HEAD=assembly \
The complete set of variant calls for this region is given in the text file:
Input files
DISCOVAR requires a BAM file containing the raw reads from the sequencer. For variant calling it also
requires a matching reference FASTA file.
BAM files
The reads to assemble must be in a BAM file or files. The name of the BAM file is specified with the
required argument READS :
READS= filename
Multiple BAM files are specified using a comma separated list:
READS= filename1,filename2,...
Alternatively, the BAM files can be specified in a separate file contain a list of BAM filenames, one per
READS= @listfilename
DISCOVAR calls SAMtools internally to extract reads from the BAM.
Reference file (optional)
This is only required if you are using DISCOVAR as a variant caller. The reference information is used
only for variant calling and not in the assembly process. Specifying a valid FASTA reference file is all
that is required to cause DISCOVAR to generate variants.
To specify a reference FASTA file use the optional argument REFERENCE :
REFERENCE= filename
It should be the same file that was used to generate the alignments in the input BAM file(s), or at least
should share the same coordinate system. The FASTA record names should match those in the BAM
file. Ns are allowed.In addition to the reference FASTA file, DISCOVAR also requires the associated index file ( .fai
DISCOVAR can currently de novo assemble small genomes (up to 50 Mb), with larger genome support
to come soon.
The syntax for DISCOVAR de novo assembly is:
Discovar READS= bamfilenames \
OUT_HEAD= outputfilename \
This will take as input all the reads in the BAM file reads.bam , generate an assembly, then write the
output to a set of files prefixed with assembly