I want to compare two genomes that I obtained from my shotgun metagenomic sequencing data. Seems like "cd-hit-est-2d" option works perfectly for me as I need to find out what sequences are similar/dissimilar between these two genomes. However, I read in the CD-HIT manual that it's good for non-intron containing sequences e.g. EST. That means I SHOULD NOT use CD-HIT for my metagenome comparison?
If not, please suggest any other tool that I can use to compare the genomes.
Many thanks in advance!
Actually I used CD-HIT and it worked. It took only a minute compared to what is reported that it may take longer/resources issue etc.
My point is that should I not trust these results?
You can trust the result. CD-HIT works for non-intron containing sequences, and prokaryotic genomes (that is presumably what you have) are intronless. It would likely work even for intron-containing genomes as long as they are closely related.
How does one assess (meta)genome similarity using a clustering tool like
CD-HIT
? Do you just look at the proportion of clustered contigs w.r.t. total number of contigs?CD-HIT has an algorithm called "cd-hit-est-2d" that compares the two genomes or nucleotide datasets and outputs the sequences that are similar/dissimilar between the two.