Question

PAN and CORE genome analysis

1

Entering edit mode

9.7 years ago

bioinformaticssrm2011 ▴ 90

Hi,

I have an OTU biom file (obtained from Closed reference QIIME 1.8.0 v) contains 65 samples, I am trying to do analysis for PAN/CORE genome.

I have filtered out the taxonomy from the abundance file (with particular threshold, lets say 60 %), now I have an taxonomy column only in file from all the 65 samples (with threshold 60%), Is there is a way where I can do the functional annotation for it?

Any server/ software is there which can do that? or which do pan (complete) /core (shared) analysis

Any suggestions ?

Best!
Shashank

genome sequencing next-gen Assembly • 5.4k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by bioinformaticssrm2011 ▴ 90

0

Entering edit mode

What you mean with PAN and CORE genome analysis is that you want to find the complete genes/proteins and shared genes/proteins among your OTUs, right? I'm not familiar with biom file, but does it contain the genome sequence(s) of the organisms that you're analyzing? You will need the gene sequences of the whole genomes (or protein sequences of the whole proteomes) to get the pan-genome and core-genome.

ADD REPLY • link 9.7 years ago by sentausa ▴ 650

0

Entering edit mode

Yes, Complete gene and Shared gene.

Biom file don't have the genomic sequence. It looks like-

339039 Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;unclassified_Rhodospirillales
199390 Bacteria;Chloroflexi;Anaerolineae;Caldilineae;Caldilineales;Caldilineacea;unclassified_Caldilineacea
370251 Bacteria;Proteobacteria;Gammaproteobacteria;unclassified_Gammaproteobacteria

Where number represents the OTU ID, followed by taxonomy. OTU ID represents the particular sequence associated with the particular taxonomy.

If I incorporated the gene sequence by using the OTU ID corresponding to the taxonomy, Now I have a gene sequence file, than how can I use it for further analysis ?

Cheers!

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 9.7 years ago by bioinformaticssrm2011 ▴ 90

0

Entering edit mode

"OTU ID represents the particular sequence associated with the particular taxonomy." What particular sequence is it? From one gene only? You can't do pan- and core-genome analysis using only one gene from each species/OTU. You need the genome (or better, the proteome) from each OTU, find orthologous gene/protein among the OTUs (I used OrthoMCL for my bacteria), and there you have the core-proteome. The pan-proteome would be the core plus any other proteins of each OTU.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by sentausa ▴ 650

score 0 · Answer 1 · 2016-04-15

0

Entering edit mode

8.7 years ago

archana.bioinfo87 ▴ 210

Dear you can also try DAVID functional annotation database. For more detail plz see this link https://david.ncifcrf.gov/tools.jsp

Hopefully this may help you.

ADD COMMENT • link 8.7 years ago by archana.bioinfo87 ▴ 210

score 0 · Answer 2 · 2016-04-15

If you used GreenGenes 13_5 as a reference database, you can associate the OTUs with protein content of nearest sequenced reference genomes with PICRUST. However, in my opinion this approach is pretty much worthless. The 16S sequence of your OTU representative being relatively similar (60% is not even remotely similar, threshold should be like 99.9% for this stuff) to the 16S sequence of some reference genome does not mean that that the protein contents of these two genomes are even remotely similar..