Hi!
A little background: I have a metagenomic sample (DNA-seq). I want to figure out the gene depth/abundance of all the genes in the sample. Since there a lot of different (and same) bacteria in the sample, I expect some of the genes to have a high abundance. Keep in mind that it is a DNA-seq, so it's not an expression analysis, but rather what genes (and their depth) the sample inhabit. Lastly, I want to group the genes in COGs while still preserving the quantitative information.
I started with an assembled contigs-file and 2 FASTQ-files (Paired-end). I then predicted genes with Prokka (bacteria only) from the contigs-file.
Now, I'm not sure whether I should map my reads to the predicted genes, and then just count how many reads mapped to each gene, or if I should map my reads to my contigs, and then use FeatureCounts, to figure out how the predicted genes relate to the alignment. I guess the ladder would give me more freedom (for instance, counting fragments instead of reads, as I'm dealing with paired-end). However, I'm not too confident on the topic.
What are your suggestions?
I re-opened this one and deleted the previous thread. Please be sure to keep things now focused in this thread to avoid information being spread across multiple identical / similar threads.
That is noted, thanks!