Question

Create Count Data Out Of Sam File

1

Entering edit mode

13.3 years ago

lsvijfhuizen ▴ 90

Dear All,

At the moment I am working on a mouse SAGE data project. The first analysis step was to align the reads against a reference genome and count per gene the number of reads. That generates a count data table for each mouse. I compared the difference in counts between genes for our wild-type mouses (5) with our mutant mouses (6) to see if a gene is differentially expressed between these mouse models.

Now i want to run the same analysis only with a transcriptome as reference. I generate a SAM file, and now i am wondering if there is a easy way to count unique transcripts in the SAM file and report this as a count data file.

Hope that it is clear to you, Thank you!

Greetz

sam transcript reference • 6.2k views

ADD COMMENT • link updated 13.3 years ago by swbarnes2 15k • written 13.3 years ago by lsvijfhuizen ▴ 90

0

Entering edit mode

OFF TOPIC: Just out of curiosity, if your transcriptome has all available isoforms (or transcripts) of a gene, then how do you distinguish reads that fall in the identical portion of the two isoforms? Wouldn't those reads map to multiple positions? How do you resolve this?

ADD REPLY • link 13.3 years ago by Arun 2.4k

score 2 · Answer 1 · 2012-05-14

You can do something like:

cut -f 3 transcipts.sam | sort | uniq -c > transcripts_counts.txt

(I don't remember off the top of my head if field 3 is the right one for a .sam file)

That will select out only column 3 of the .sam file, the sort will sort that list, and uniq -c will output a list of every unique entry, and how many times it was in that list.

Better would be for you to stay out of .sam format all together, and do:

samtools view transcripts.bam | cut -f 3 | sort | uniq -c > transcripts_counts.sam