Entering edit mode
12.1 years ago
Naren
▴
1000
I have found out core genome of 30 bacterial species of same genus. I have unique genes of each species.
I am yet to determine pan genome.
I m going to compare the codon and amino acid usage, protein function annotations etc. for different sets of genes.
Will comparing these parameters for 'Core genome set' with 'set of Combined Unique genomes ' provide any useful information?
Should I compare core genome with pan genome and accessory genome as well?
Suggestions please. Thanks in advance.
The phrase pan/core genome is ambiguous. I think you are talking about the set of protein sequences, which I think is better called the pan/core proteome. A true pan/core genome analysis will incorporate all chromosomal/plasmid DNA, not just protein coding (CDS) regions.
Thanks for the reply @Torst, You are right, I should use the term Proteome, I have determined common orthologs in all 30 bacterial proteomes using NCBI BLAST. I will be comparing their function annotations. I want to determine what percent of proteins come under different functional categories in core proteome and compare it with the same in Pan proteome? My question is what this comparison may reveal ? or what else should I do with such data to make it publishable.
If you have 30 bacteria, there are 2^30 (1 billion) possible protein-membership sets, as each protein (ortholog group or unique singelton) can be in or not-in each of the genomes. The core proteome are those proteins that are in each of the 30 genomes.