Question

Counting Repeat And Unique Reads Of Tophat Output

2

Entering edit mode

13.2 years ago

Stevelor ▴ 310

Hey,

I used Tophat for paired-end RNA-Seq mapping and converted the "accepted_hits.bam" to a *.bed file with 82859900 entries/lines -> hits on the reference genome. I wanted to know how much unique and repeat reads i've got...also on how many locations on the reference genome the repeats reads hit.

So i wrote some lines of code comparing and counting the unique read IDs with following result:

hits: 82859900
unique hits: 75600252
repeat hits: 3217634 hit on 7259648 locations

Looks good!!! But is there another way to get these counts out of the tophat log-files? What are they for, cause they give me strange counts^^
Or is this the only way to get this information??
How do you count these reads??
I am not happy with samtools flagstat and picardtools :(

Cheers, Steve

tophat rna parsing read • 7.3k views

ADD COMMENT • link updated 13.2 years ago by Gww ★ 2.7k • written 13.2 years ago by Stevelor ▴ 310

1

Entering edit mode

it would be nice if you can share your lines. I would really like to know how to do something like that.

ADD REPLY • link 13.0 years ago by Assa Yeroslaviz ★ 1.9k

score 7 · Answer 1 · 2011-09-06

7

Entering edit mode

13.2 years ago

Gww ★ 2.7k

In the bam file created by TopHat there is an auxiliary tag (NH) that specifies the number of hits each read has. For example, NH:i:2 says that there are two hits for that read.

ADD COMMENT • link 13.2 years ago by Gww ★ 2.7k

0

Entering edit mode

do you know what the NM and XS specify?

ADD REPLY • link 13.2 years ago by Holly ▴ 30

0

Entering edit mode

NM is the number of mismatches in the read. XS: Is the eXpected Strand of the transcript based on transcript annotations and / or splice site motifs ie. GT:AG or AT:AC.

ADD REPLY • link 13.2 years ago by Gww ★ 2.7k

0

Entering edit mode

NM is the number of mismatches in the read. XS: Is the eXpected Strand of the read based on transcript annotations and / or splice site motifs ie. GT:AG or AT:AC

ADD REPLY • link 13.2 years ago by Gww ★ 2.7k