Question

Find HBV integration sites in human genome.

0

Entering edit mode

5.2 years ago

Chirag Nepal ★ 2.4k

Hi there,

I want to identify HBV (virus) integration sites in human genome.

I have single end CAGE-seq (~30 nucleotide long) data on HBV patients. I mapped reads to human genome (used Bowtie2 version 2.29), and unmapped reads were mapped to HBV genome.

From the remaining unmapped reads, I want to find reads that partially align to human and partially to HBV. If the data was pair-end, it would have been slightly easier. Can you please suggest how do I systematically (logic how to do it, I can implement it) get this information to find the integration site.

I am thinking of fragmenting (keep same FASTQID) the unmapped reads and map to both human and HBV, and identify which IDs map to both human and HBV.

Any suggestion on how i could do this efficiently. Does using BWA help in this case?

Thank you !!

ChimericReads HumanVirusGenome CAGE-seq • 1.4k views

ADD COMMENT • link updated 5.2 years ago by Pierre Lindenbaum 166k • written 5.2 years ago by Chirag Nepal ★ 2.4k

score 0 · Answer 1 · 2020-04-03

0

Entering edit mode

5.2 years ago

Pierre Lindenbaum 166k

map your reads with bwa mem against a reference containing both host+virus sequence and then detect the discordant reads: Extracting chimeric reads from mapping

ADD COMMENT • link 5.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you Pierre ! So you suggest to merge genome sequence of human + virus. Then map the unmapped reads using BWA on combined genome assembly. Reads that map partially to human and virus will be flagged as discordant. I think it is a neat idea. What is the flag for discordant reads.

ADD REPLY • link 5.2 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

What is the flag for discordant reads.

there is no specific SAM flag, look at the link : Extracting chimeric reads from mapping

ADD REPLY • link 5.2 years ago by Pierre Lindenbaum 166k