I have just aligned my raw FASTQ-files (from a metagenomic sample) to my predicted genes (I used Prokka) with BWA, in order to get an idea of the abundance of each of the genes in the sample. I now have a SAM and a BAM file. My question is: How do I count the abundance (how many reads are mapped to each gene), of all my genes. I basically want an output in the style of (of course doesn't have to be this exact formatting, but you get the idea):
Gene 1 - Mapped reads: 34
Gene 2 - Mapped reads: 85
and so forth.
I could make a script that would extract and count the mapped reads, but I wonder if there is a tool out there that could do it better?
Prokka also outputs gff / gtf / bed files, correct? Use featureCounts with the bam and the gtf, read the docs for more details.
Thanks for your response! Porkka only outputs gff files. Is that enough?
featureCounts should work with gff. Out of curiosity, are the raw fastq files you mapped RNAseq reads?
Cool, I'll check it out! They are DNAseq, Illumina :)