Hi!
We've been looking at Greens Genes as a potential 16S database. Initially we downloaded the file current_GREENGENES_gg16S_unaligned.fasta.gz from here (2011), but then we realized there was a second more recent Green Genes site and downloaded gg_13_5.fasta.gz (2013).
What is the criteria difference of each database?
For instance, we ran a women's health sample against the first db and found a vast amount of gardnerella vaginalis, but when we ran it against the second no g.vag was found?! After looking into this further we realized that the second db wasn't classifying any gardnerella it found down to species. So, why was GG comfortable declaring gvag down to the specie level in their original database and not in their second?
Why is the second significantly larger? What is going on?!!! We can't find documentation on this.