Entering edit mode
6.3 years ago
netsam
▴
10
Hi, I assembled a paired end Myseq run with Wolf D-LOOP sequence as reference. I obtained a 494 bp contig. Then I align these sequence trought Emboss Water with a different D-LOOP and I saw many gap in my assembled sequence. In another samples I obtained a more compact alignment, with no gap and only with some SNP between the two sequence.
Is it a coverage problem ??
Well, what is your coverage?
Yes the coverage is 10.106236X, it's written in Spades output file. Can I check in another way the coverage?
Are you going by the number in the contig header? (
NODE_1_length........cov1.23456
? If so, that's kmer coverage, not contig coverage. If that's from the log file then it's probably not kmer coverage.To be sure, you can recalculate it. In order to get the coverage depth, remap your reads against that section of sequence (or just extract that region after re-mapping against the whole genome), and use a tool like Qualimap to get the coverage.
Ok, I extract the reads mapping against a D-LOOP of almost 450 bp and I obtain 100 sequence paired end (50R + 50L ). If I multiple 50 * 100 bp (average length of a reads) I get 5.000 bp.
5.000 bp / 450 = almost 10 x
This is the coverage of reads ?
And what is the coverage of contigs ?
It sounds like it may equate to ~10X, though I'm not 100% convinced by that calculation, since I think it assumes perfect spanning of reads across the 450bp.
10X is on the low side, but I don't think it specifically accounts for your strange alignment pattern. I would follow h.mon's suggestion of assessing the coverage by eye in IGV or another genome browser.
Its possible that region hasn't sequenced well, so maybe check your reads for N's or low qualities.
Otherwise it could well be real, and your sample has several indels - it is difficult to say.
Did you try blasting online the "Node_1_length" sequence? Did you map the reads to the assembled sequence and inspected the mapping on IGV or other genome browser?
With blast there's a match with Canis lupus isolate Mongolia2 mitochondrion, partial genome, and other similar sequence; NOT d-loop. I'll try to visualize with IGV. I mapped reads with Bowtie 2, maybe it's a method, algorithm or parameters problem?
Well, either that or this bit of sequence isn’t what you think it it. Did you have a look at the alignment blast produced?
yes, it matches perfectly 100% with a mitochondrial region between 11201 and 11593. I think that that isn't D-LOOP region. In a sequence that I extracted from reads of another sample, there was a perfect match with D-LOOP sequence in a region of mitochondrion starting from around 16000 bp. Because D-LOOP usually it's a conserved region and all of this sequence come from wolf I expect to have a little variance and not large indels
Your 494 bp contig is not a D-loop sequence then by the looks of it!
I changed my approach, - Mapped my reads with Bowtie against complete Mitochondrial DNA of Wolf (~16.000 bp) - Extract mapped reads (~2.000 reads) - Assemble contig of mtDNA of sample with Spades and trusted contig (same mtDNA for mapping) - Search DLOOP sequence inside my assembled contig with Emboss Water
I extracted DLOOP and align perfectly with other DLOOP of wolf (with 3 or 4 differences) !!
I'm not sure of --trusted-contig use in Spades, does it substitute bases not present in reads with bases of trusted-contig in final contig ?
No, the trusted contigs are used as a sort of scaffold for guiding how the de Bruijn graph is built, but what you’ve done should be fine.
Thanks, I appreciate your help.