Hey,
I used Tophat for paired-end RNA-Seq mapping and converted the "accepted_hits.bam" to a *.bed file with 82859900 entries/lines -> hits on the reference genome. I wanted to know how much unique and repeat reads i've got...also on how many locations on the reference genome the repeats reads hit.
So i wrote some lines of code comparing and counting the unique read IDs with following result:
hits: 82859900
unique hits: 75600252
repeat hits: 3217634 hit on 7259648 locations
Looks good!!!
But is there another way to get these counts out of the tophat log-files? What are they for, cause they give me strange counts^^
Or is this the only way to get this information??
How do you count these reads??
I am not happy with samtools flagstat and picardtools :(
Cheers, Steve
it would be nice if you can share your lines. I would really like to know how to do something like that.