In Whole-genome sequencing of the Akata and Mutu Epstein-Barr virus strains. it is said:
First, reads were aligned using the genome aligner Novoalign (with a reference genome containing a human sequence plus abundant additional sequences (including adapters and mitochondrial, ribosomal, and enterobacterial phage phiX174 DNA sequences). Junction-spanning reads were then identified using the junction mapper TopHat.
for this last point, is tophat able to find the junction between two distinct 'trans' sequences from the reference (virus+host) or do I misunderstand the article ?
Do you have any example of workflow doing this kind of analysis ?
I think using tophat with -fusion-search parameter I think you might find some "fusion" reads between virus and host if there exists (if you use the virus sequence as an additional chromosome). But I'm not aware of any paper talking about this..
nice ! please add this as an answer and I'll validate it.
For fusion events, you can find with split-read aligners such as bwa-sw, bwa-mem, bowtie2 (to some extend), mosaik and probably a few more. I am wondering what is the advantage of a fusion-aware tool like tophat?
@lh3 I want to be sure I understand the word "fusion": do those tools are able to detect a split-read between two distinct "chromosomes" ? In my case, I would like to find the site of insertion of the virus, so a read would contain some bases from the host cell and some bases from the viral gene.
They do. The mappers simply find local hits. Think how you would find fusion events in the old days: run blast and then find hits to different chromosomes. These mappers are essentially faster blast tailored for NGS data.
I see; Thank you Heng .
@Pierre Can you explain how you accomplished this using bwasw? Did you search for all thos reads where read1 aligned to lets say human chromosome and its mate pair to some viral gene?
I just played with bwa-mem and scanned the SAMRecords matching both organisms.