Hi there,
The main goal of my research, is to find the abundance of genes in a metagenomic sample. Meaning, that I not only want to figure out what kind of genes are in the sample, but also how many of each of the genes are present (since it is a big metagenomic sample with lots of bacteria, the abundance of some of the genes is expected to be quite high).
In order to keep everything manageable, I want to group genes with very close function into the same group. I know that EggNOG can do this. My question is now, how do I group my genes, while still taking the abundance/depth of the gene into account? EggNOG takes as input a FAA-file, which only contains gene protein sequence information, but not the abundance/depth. I have used BWA with my raw metagenomic reads onto my annotated genes, to get an alignment. Is there a way for me to use this information from BWA? If there is a whole other, better apporach, feel free to speak up!
I have achieved the following datafiles:
Prokka/Prodigal output files (faa, fsa, fna etc) from gene annotation of prokaryotic bins (from metagenomic data).
BWA output files (bam, sam) from alignment of raw reads onto my annotated genes (did this to achieve the depth of each of the genes; how many the reads cover each of the genes)
EggNOG output files (tsv, txt) from the annotated genes (used to categorize the genes into groups of function, to make everything more manageable and easier to visualize). But without gene depth information.
Thanks very much in advance!
Please feel free to ask questions, if I have been too unclear.
My first question is, if it's a metagenomic sample, what reference are you using? This is important point to consider, especially if the gene of interest is in multiple genomes/contigs you are aligning to. BWA will most likely thoughout the reads because they are mutlimapping. With your current approach, after grouping with EggNOG why not go back and associate the groupings with the depth? Or do you want to somehow combine depth and functional grouping together into one technique (which I don;t know of)?
Have you checked out more specialised tools such as
Karp
: for pseudoalignment and quantification? https://github.com/mreppell/KarpAlso checkout this paper it sounds liek it might be of interest to you: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0722-6
I assembled my reads into contigs with a De novo assembly approach. My metagenomic sample, is basically a big pool of various bacteria with different abundance.
How would you suggest I associate the groupings with the depth? My output from Eggnog, is basically a file with the gene groupings, but I have no way of knowing the depth. I'm not sure how to link it.
It doesn't necessarily have to be one technique. Actually the approach doesn't really matter, as long as I get a satisfactory result.
I'll have a look at your references! Thanks
On a side note, I should probably add that It is metagenomic samples, from pig microbiomes that I'm dealing with.