Question

Counting Mapped reads from Bowtie2 Sam file

1

Entering edit mode

10.4 years ago

2kg2523 ▴ 10

I have illumina paired-end reads from three samples. All reads were assembled by trinity to get a TranscriptME which I used as a reference for mapping.

I mapped the three sample paired-end reads separately on TranscriptME using Bowtie2. As a result I have three outputs of SAM file. I am trying to get the number of reads mapped to each contigs (transcripts) of the TranscriptME. I found one script called countxpression.py and ran on the SAM file. The script is downloaded from here. I did not manipulate those SAM files. They are directly used as input for the script. (./countxpression.py 20 92 output.txt xxxx.sam)

It gave me the following error

Traceback (most recent call last):
  File "countxpression.py", line 154, in <module>
    countxpression(file, sys.argv[1], sys.argv[2], 0, 'text', file[:-4]+'_counts.txt', sys.argv[3])
  File "countxpression.py", line 61, in countxpression
    if float(cols[columntocount].split(':')[2]) > 1: # for reads with >1 optimal hit
IndexError: list index out of range

Is there another way to get the counts from sam file? I tried eXpress and it worked well. But I am not sure if I can use it for further expression analysis as it is.

Thank you very!

RNA-Seq alignment countxpression • 3.9k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 10.4 years ago by 2kg2523 ▴ 10

0

Entering edit mode

Devon makes some good points. But if you DO want to directly count read hits to transcripts, I recommend my pileup tool, as it's super fast and easy to use:

pileup.sh in=mapped.sam stats=stats.txt

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 10.4 years ago by Brian Bushnell 20k

score 1 · Answer 1 · 2015-02-19

I'd strongly advise you to use eXpress/rsem/etc. rather than directly counting reads for this sort of analysis. Getting expected counts from the aforementioned tools will give you better results than actually counting alignments per transcript (where you either double count alignments that multimap or simply ignore them and thereby throw away a lot of your data). Downstream, limma/voom is a good option for differential expression.