Is there a better war to count how many reads are mapped to each sequence/transcript and get a count table with names and counts than this:
perl -ne ' if (/^\@SQ/) { @F = split(/\t|:/, $_); print $F[2]."\n" } ' SAMFILE > ID.txt
perl -ne ' chomp($_); print $_."\t".`grep -c "\t$_" SAMFILE ` ' ID.txt > COUNTTABLE
The sam file is 8GB nevertheless the transcripts bowtie index was containing only 70 transcripts I need, and it takes me forever (couple of hours) to get this count-table?
What about samtools idxstats?
I think this should work for him.
AFAIK the numbers reported by samtools idxstats represent the number of alignments of reads that are mapped to the sequence, not the (non-redundant) number of reads what I need?
EDIT:that worked fine :)
I wrote an implementation in C++ for my purposes using seqan library to do just that. Works fairly fast. Didn't try it on such big files though. You can try doing something similar. With seqan, it should be relatively painless.
Why you don't use standard tools such as HTSeq, summarizeOverlaps in R or RSeQC?