I have paired end data for normal as well as tumour samples for cervical cancer. I am looking into the problem of determining the presence of Viral genomes [one or multiple] and their site of integration.
Determining the presence is trivial, since you can just separate out the reads unmpaaed to human genome and then re-align with the custom viral genome fasta.
I went through the question and the solutions here : Method to identify viral integration site in human genome from NGS data? but they do not solve my problem.
I came across another paper which addresses the same problem : http://jvi.asm.org/content/early/2013/05/30/JVI.00340-13.full.pdf. In the 10th page they mention about using a clustering method to determine the site of integration, but I didn't quite follow the approach.
Another thing bwa-sw [as mentioned in the other biostar question] would not help me determine the integration sites, or am I mistaken ? Can someone guide me to a better approach or provide an explanation to the paper' algorithm?
duplicate of
Method to identify viral integration site in human genome from NGS data?
see also
Can Tophat be used to find the virus-host junctions ?