gene abundance profiling from metagenomes
1
0
Entering edit mode
2.5 years ago

Hello l this is the first time that I have to do this so I don't know what tools to use.

I have 19 metagenomic samples represented by sets of contigs and a database of proteins of interest in fasta format. The issue is that I have to perform a gene abundance profile per each metagenomic sample.

By now I performed a gene copy number analysis using mmseqs2 using mmseqs search program with the contigs as query and the gene database as target and it gave as output a blastn table (output format 6): https://www.metagenomics.wiki/tools/blast/blastn-output-format-6

At this point I don't know if parsing this table will give me abundance information. I think that for performing gene abundance I must use the read information instead of the contig information isn't it?

The contigs represent the consensus of the reads so the proper way to calculate the abundance of the genes may be by mapping the reads against the contigs at which the genes mapped isn't it? any tools for that?

Do you recommend me some tools or paper to read for performing a gene profiling analysis for metagenomic samples?

I also have the metagenomic reads from which the contigs came from.

Thanks for your time :)

gene-abundance mapping metagenomics • 1.1k views
ADD COMMENT
1
Entering edit mode
2.5 years ago
Mensur Dlakic ★ 28k

Not sure at all what you are trying to do, despite the lengthy explanation. What you seem to suggest is not something that can be done, or something that would yield useful information.

Let's say that organisms XX and YY both have a gene called geneA. You can asses the presence of that gene in both genomes, but raw read abundance will only tell you about the relative abundance of organisms and nothing else. That is to say that if XX is 10x more abundant than YY, its geneA in theory will have 10x the reads mapping to it.

If you are trying to ascertain the abundance of geneA in the whole community, that again will likely be the function of organismal abundance rather than anything else. Let's say that you have one community where geneA is present in a single organism which comprises 50% of the total community. You will get a larger abundance of mapping reads in that community than from a different community where geneA is present in 3 organisms, but each of them comprises only 10% of the total community. I don't think that knowing gene copy numbers will help you much without knowing the organism(s) they belong to, and their relative community abundance as well.

ADD COMMENT
0
Entering edit mode

That is the main thing that I was asking myself these weeks. Seems a little bit confusing to calculate the abundance of a gene on an entire metagenomic community since first there could be more than 1 copy of that gene, so I have to calculate the gene abundance for each one of the genes and then what? calculate mean abundance of these copies and repeat that for the rest of the genes? seems cumbersome to me. I will discuss this further with my teamwork thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6