Counting abundance of each gene in SAM/BAM file

0

Entering edit mode

5.1 years ago

Hansen_869 ▴ 80

I have just aligned my raw FASTQ-files (from a metagenomic sample) to my predicted genes (I used Prokka) with BWA, in order to get an idea of the abundance of each of the genes in the sample. I now have a SAM and a BAM file. My question is: How do I count the abundance (how many reads are mapped to each gene), of all my genes. I basically want an output in the style of (of course doesn't have to be this exact formatting, but you get the idea):

Gene 1 - Mapped reads: 34

Gene 2 - Mapped reads: 85

and so forth.

I could make a script that would extract and count the mapped reads, but I wonder if there is a tool out there that could do it better?

Samtools Abundance gene bam sam • 2.9k views

ADD COMMENT • link 5.1 years ago by Hansen_869 ▴ 80

1

Entering edit mode

Prokka also outputs gff / gtf / bed files, correct? Use featureCounts with the bam and the gtf, read the docs for more details.

ADD REPLY • link 5.1 years ago by h.mon 35k

0

Entering edit mode

Thanks for your response! Porkka only outputs gff files. Is that enough?

ADD REPLY • link 5.1 years ago by Hansen_869 ▴ 80

0

Entering edit mode

featureCounts should work with gff. Out of curiosity, are the raw fastq files you mapped RNAseq reads?