Count number of mapped raw reads for each transcript in SAM file
2
0
Entering edit mode
9.4 years ago
manekineko ▴ 150

Is there a better war to count how many reads are mapped to each sequence/transcript and get a count table with names and counts than this:

perl -ne ' if (/^\@SQ/) { @F = split(/\t|:/, $_); print $F[2]."\n" } ' SAMFILE > ID.txt
perl -ne ' chomp($_); print $_."\t".`grep -c "\t$_" SAMFILE ` ' ID.txt > COUNTTABLE

The sam file is 8GB nevertheless the transcripts bowtie index was containing only 70 transcripts I need, and it takes me forever (couple of hours) to get this count-table?

RNA-Seq • 3.9k views
ADD COMMENT
2
Entering edit mode

What about samtools idxstats?

ADD REPLY
0
Entering edit mode

I think this should work for him.

ADD REPLY
0
Entering edit mode

AFAIK the numbers reported by samtools idxstats represent the number of alignments of reads that are mapped to the sequence, not the (non-redundant) number of reads what I need?

EDIT:that worked fine :)

ADD REPLY
0
Entering edit mode

I wrote an implementation in C++ for my purposes using seqan library to do just that. Works fairly fast. Didn't try it on such big files though. You can try doing something similar. With seqan, it should be relatively painless.

ADD REPLY
0
Entering edit mode

Why you don't use standard tools such as HTSeq, summarizeOverlaps in R or RSeQC?

ADD REPLY
3
Entering edit mode
9.4 years ago

With the BBMap package:

pileup.sh in=mapped.sam covstats=stats.txt rpkm=rpkm.txt

It's fast.

ADD COMMENT
2
Entering edit mode
9.4 years ago

How about using a gff or gft file of your reference genome and a dedicated program such as:

  • summarizeOverlaps from the R package GenomicAlignments
  • htseq-count which is written in phyton HTSeq (Python)
  • featureCounts from the R package Rsubread
  • simpleRNASeq from the R package easyRNASeq
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

The problem is that I do not want to make it complicated involving whole genome or transcriptome analysys, I just have 70 sequences which I index for bowtie and map my reads to a SAM file....and want to count the reads in those 70 sequences.

ADD REPLY
0
Entering edit mode

Check swbarnes2's comment above.

ADD REPLY

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6