How To Screen Genomes For Compositional Studies?

2

Entering edit mode

11.6 years ago

SRKR ▴ 180

I am working with around 2600+ genomes and wish to study the genome, gene and intergenic features among various groups. In case of taxonomical groups which have very few representatives, there is no issue. In case of taxonomical groups having multiple genomes, on what basis shall I remove similar genomes so as to get just a few representatives from each taxonomic group. Should I use lenght or GC% or some other feature to remove genomes - like if two genome have a GC% variation of less than 1% I shall remove that. Some thing like that. Please suggest accepted ways and kindly explain the reason as well.

Example:

I have around 60 genomes of Mycobacterium sps of which more than 20 are of M. tuberculosis alone which have GC% range of 65.48 to 65.7 and length range of 4.27 to 4.41 MB

genomics • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 11.6 years ago by SRKR ▴ 180

Login before adding your answer.