Hi, I have an abundance table (each row corresponds to a taxa). I would like to get the tree (newick default format) of the table. The table contains bacteria, archeae and fungi.
How can i get a phylogenetic tree from this table (the original table has around 600 rows)?? The idea is to use the tree for further analysis using R. Note that it is also possible for me to use ncbi taxa ids since i have them (not shown in this table).
SampleA Phylum Class Order Family Genus Species
12 Actinobacteria Actinomycetales Actinobacteria Actinomycetaceae Actinomyces Actinomyces neuii
3 Actinobacteria Actinomycetales Actinobacteria Actinomycetaceae Actinomyces Unknown
34 Actinobacteria Corynebacteriales Actinobacteria Corynebacteriaceae Corynebacterium Corynebacterium sp. HMSC064E10
59 Actinobacteria Corynebacteriales Actinobacteria Corynebacteriaceae Corynebacterium Corynebacterium aquilae
965 Actinobacteria Propionibacteriales Actinobacteria Propionibacteriaceae Tessaracoccus Tessaracoccus sp. NSG39
44 Proteobacteria Unknown Unknown Unknown Unknown Unknown
thanks Philipps, Do you know if is possible to change the name of the ncbi ids in the tree. Although i will use the ncbi taxids to recover the tree i would like to to change the taxid in the output tree with an OTU id.
For instance each row of my sample corresponds to a specific OTU , OTU_1, OTU_2 and so on...
Is it possible to change that easily ?
Finally i would like to import this tree into phyloseq and the tree needs to have identifiers ids.
Thanks,
Please use Add comment to respond to answers.
Sorry added the comment at the bottom
Thanks guys for your comments and sorry for the confusion. Let me try to better explain.
My data is WGS data (not 16S). What i call OTU (i know this is confusing corresponds to one species or lower rank (e.g order or phylum or class....) if resolution is lower) . Each OTU comes from a subset of my marker genes.
For instance from my contigs i extract marker genes (single copy genes) and assign the taxonomy to them. The reason for this is that the resolution at species level is much better. One single copy gene does not correspond to one species but a subset of marker genes correspond to one species. (e.g GeneA+GeneB+GeneC = Staph Aureus). So each line of my dataframe corresponds to a subset of marker genes but only one species ( or class or phylum depending if resolution is enough).
I need to generate a phylogentic tree (to be used with Phyloseq, note that Phyloseq can work with any type of WGS data, not just 16S data). Normally i would align each sequence from each line(OTU) if this was 16S data, however each line is a combination of several marker genes.
Hope it´s much clear. How would you generate a phylogentic tree in such case ?