Question

Inferring most probable integer copy number (repeat count) of contigs of a draft genome

0

Entering edit mode

9.9 years ago

misaghb ▴ 20

I have a set of assembled contigs (genomic sequences). These contigs were assembled using a de novo short read assembler (e.g. Velvet, ABySS, etc). I map/align a new short read library (e.g. Illumina paired end or mate pair library) against these contigs using a short read mapper like BOWTIE2. So now I have a SAM/BAM alignment file at hand.

Is there any software (or a pipeline using different tools) to take this alignment SAM/BAM file as input and outputs which contigs are repeated in the genome and estimate those integer repeat counts (copy numbers) using coverage information in the alignment input file?

I was suggested to use your BamStats04 program.

But I am interested in having integer copy number of each contig.

Input: SAM file (result of aligning paired reads against set of contigs (no reference available))

Output: Most probable integer copy numbers (repeat counts) for each contig.

For example:

contig_1        1
contig_2        4
contig_3        1
...             ...
contig_n-1      2
contig_n        1

Notes:

No reference genome available.
Magnolya is a software that can be used for this purpose although it's not specifically developed for this problem. But it's limited to the ace format and some depreciated scripts from tools like ABySS and Newbler.

copy-numbers draft-genome repeats contigs • 2.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by misaghb ▴ 20