Question

to calculate coverage of each contig after assembly

1

Entering edit mode

4.9 years ago

singh.jyotika ▴ 10

I have the read file and assembled contigs (both in .fasta format). I need to calculate the coverage for each single contigs across the read file. I tried doing (1) indexing with bowtie (2) alignment with both with bowtie-align and tophat2. its giving me the error "Splice sequence indexing failed with err =1". Kindly help me how to proceed with the coverage calculation of each contig.

alignment • 3.5k views

ADD COMMENT • link updated 4.9 years ago by Shyam ▴ 150 • written 4.9 years ago by singh.jyotika ▴ 10

0

Entering edit mode

Hi Jyotika,

[1] I have the read file and assembled contigs (both in .fasta format)

Reads in fasta? Are you sure its not fastq rather?

[2] I need to calculate the coverage for each single contigs across the read file

This does not make sense . What do you mean by across the read file? Elaborate.

[3] Splice sequence indexing failed with err =1

You need to tell us the exact commands you ran.

Thanks

Vijay

ADD REPLY • link 4.9 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Command that I run was tophat2 -r 20 454.10species.fasta MC55.MG10.AS1.C1.fasta

mine is a metagenome data. I need to get the coverage/depth of every single contig in my .fasta file. The file MC55.MG10.AS1.C1.fasta is having only one sequence, likewise I have 10000 contigs for which I need the coverage/depth. 454.10species.fasta is my metagenome file file after sequencing.

ADD REPLY • link 4.9 years ago by singh.jyotika ▴ 10

2

Entering edit mode

I am not sure why you are using tophat for this analysis. It may be simpler to use bwa or bbmap (from BBMap suite) and get alignments of your reads against the assembly. You would probably want to place multi-mapping reads in a single random location. Finally follow that up by using mosdepth (download, to get single base level coverage) or samtools idxstats analysis to get counts per contig.

Note: Having a single fasta sequence per file is going to make this ridiculously clumsy. Consider concatenating original reads in a single multi-fasta file. If you have original fastq format reads available then I would rather use those.

ADD REPLY • link 4.9 years ago by GenoMax 147k

score 0 · Answer 1 · 2019-12-27

Bowtie and Tophat are short read aligners. Though tophat support longer reads like from 454 it has a limit of 1024 bases. BWA-MEM is a better option for alignment and you can get the coverage stats using samtools. I assume the 454.10species.fasta is the read file. You need to us the multi-fasta file with all the contigs to make your life easier. You can concatenate all the contig fasta files in to one and run the alignment.

What is your bowtie index name. I think you gave the read file as index and contig file as the read file. I do not understand using the option -r for your data!.