Entering edit mode
4.3 years ago
A_heath
▴
170
Hi all,
I have paired-end reads of a lactococcal strain and its assembly.
I want to determine the origin of a specific contig of interest to known if its from a plasmid or a chromosomal origin.
I tried mapping the reads using bbmap, extension of contigs to study the termini, etc. but so far I haven't be able to determine with confidence the origin of the contig I'm studying.
Do you have any other recommendation on what I should try to do that?
Thank you in advance for your help!
Unless you can unambiguously show that the contig boundaries match known chromosomal sequences, I don't think you can resolve this for certain. You may need some long read data, or to compare your contig to whatever the best matching region in a reference genome is and determine if that region is in a chromosome.
Alternatively, you may be able to look at the gene content within that contig by annotation. If the gene content includes lots of 'typical' plasmid genes, e.g. virulence factors, phage elements, toxins, antibiotic resistance markers etc., it would be a fair bet that it originates from a plasmid. Given how mobile these things can be (jumping in and out of the chromosome etc), it wont be 100% thought.
Thank you for your help Joe. I also tried what you suggested and studied the gene content within my contig of interest. Despite the genes of interest (that could have both origin), no other genes are located on this contig. I'm kind of stuck now with this problem.. Anyway, thanks again!
Your best bet would be to compare to any close reference sequences in that case, or to try and obtain some long read sequencing data for your strain.
Unfortunately it's just not one of those things you can infer with complete confidence from short read data.