Entering edit mode
8.6 years ago
lamlam
▴
10
After assembling (novo assembler ) a sequence of Tuberculosis I found that the number of base pair is greater than that number of base pair is the reference strain Is this logical?
Yes.
And computational biology is a quantitative science. Please tell us exactly how much greater your assembly is and which reference genome you have used.
i used Mycobacterium tuberculosis H37Rv this reference has 4.4Mb and my sequence has 7.3 Mb?
Have you tried to compare the two to see how the assemblies are different? Use Mauve to compare.
Did you have an excess of sequence (> 100x gross coverage) that went into this assembly?
Yes, you should map your contigs to the refseq and then identify contigs which do NOT map. Blast these contigs to identify their origin. Furthermore, your may plot GC-content of your contigs versus coverage. Do you see more than one cluster in the scatter plot?