Entering edit mode
4.9 years ago
Can Holyavkin
▴
250
We have just sequenced the whole genome of a bacteria. We aim to detect if any recombinant gene is inserted into the genome. Inserted gene is unknown.
So far:
- Mapped the paired-reads to our organism reference genome. (via bwa)
- Extracted the unmapped paired reads. (both pairs shouldn't be mapped) (via samtools)
- Performed de novo assembly with those unmapped reads. (via velvet)
Now, I have ~400 contigs and I'll BLAST each of them. Is it a valid approach?
Or should I focus on another method? Maybe I should focus on integration breakpoints instead of unmapped reads?
Do you have a reference for the bacterium in question? Do you have no idea of the sequence/function (e.g. antibiotic resistance) of the gene inserted or are you looking to see if there are extraneous sequences that appear to disrupt an ORF?
Past thread that may be useful: Identification of the sequence insertion site in the genome
Thank you for your comment. Yes, I have the reference sequence of bacterium and already mapped the reads to it. But no idea of sequence/function of the gene inserted into genome. I am not interested if inserted gene distrupts an ORF. I want to detect any gene or large sequence that integrated to genome.
I'll check the link you mentioned and update the question if it helps.
Are 400 the total contigs from assembly? If so you should map those to the reference that you have. Using
blat
(as long as your reference is reasonably homologous) may be the fastest option. You could also useminimap2
since it will generate BAM files that you can view in IGV. These analyses will give you an idea of redundancy and parts/contigs that don't map to the genome. You will need to do some addition work (PCR etc) to prove that the insertion is indeed where you think it is).