MetaPhlAn Visualization using stacked barplot
1
0
Entering edit mode
2.8 years ago

Hey everyone,

So I've been doing some analysis using MetaPhlAn for 8 datasets and has merged them into a single merged file using the merge_metaphlan_tables.py script.

I want to do a Stacked barplot visualization using ggplot2. Could anyone here please advise me as to how to proceed? I'm fairly new to this field.

I did try the first few set of codes mentioned in the link below, but I'm confused when i reached the middle.

https://github.com/flannsmith/metaphlan-plot-by-taxa/blob/master/Converting%20Metaphlan%20profile%20to%20Phyloseq%20objects.ipynb

I did up to the 'Importing the data' step.

Many many thanks in advance!

stacked_barplot ggplot2 MetaPhlAn3 R_programming • 1.8k views
ADD COMMENT
0
Entering edit mode
2.8 years ago
acvill ▴ 350

The ID column of the merged table includes multiple annotation levels. For example, the row where ID is k__Bacteria gives the proportion of reads for each sample that can be assigned to the kingdom Bacteria, and the row where ID is k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales gives the proportion of reads for each sample that can be assigned to the order Bifidobacteriales. Stacked barplots only make sense if you are considering a single taxonomic level. This is why the first step after data import is filtering rows by ID values that end with g__, which gives all genus-level annotations in your dataset. You can repeat this process to subset tables for any taxonomic level.

You've used MetaPhlAn to do you taxonomic profiling of your metagenomic data. If you ever decide to try Kraken instead, I made a shiny app to go directly from merged abundance tables to stacked barplots.

ADD COMMENT
0
Entering edit mode

Hii, yes thank you for the clarification!

I tried doing the pre-processing part and received an error. Could you please tell what's wrong here! Copy pasting the output here:

Traceback (most recent call last): File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'NCBI_tax_id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "python2.py", line 9, in <module> df_k_genus = df_k[df_k['NCBI_tax_id'].str.contains(r'\g__[^|]*$')] File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3506, in __getitem__ indexer = self.columns.get_loc(key) File "/home/axv183/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'NCBI_tax_id'

The command i gave was:

df_k = pd.read_table("merged_output.txt", sep='\t') print(df_k.head())

df_k_genus = df_k[df_k['NCBI_tax_id'].str.contains(r'\g__[^|]*$')] df_k_genus.reset.index(drop=True,inplace=True)

print(df_k_genus.head())

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6