Hi,
I would like to identify species in my data. Since performing blast on entire data consume too much time, I first determined 16S' by using HMMer3 and extract hits which were identified by HMMer from entire data. Finally, I ran blast on this final data set. But I'm not sure whether or not assembling is necessary. Is my workflow adequate or do you suggest any other guideline?
Thanks
if you using Genomic NGS data, then you loss the sensitivities. You could mapping all of the NGS reads - with all possible hits ,then classify them by Genus -> Species. After the couple reduction and summarization, then you would have your answer what is in there Species Wise. Now, it is time you choose which species, or group of species as refernece mapping all your reads with 1 to 2 mismatches to distinguish what is the exactly bacterial stains, if you luck to get here.
Highly not recommend you do the assembly approach, unless you are trying to identify vertebrates or plants from the other high levels.
Have you had a look at similar postings here? http://www.biostars.org/post/show/2941/finding-rrna-genes-in-metagenomic-data/ yours seems to be a duplicate.
And btw, how much time would blast against a microbial data base take? Not doing this would be a waste of resources. I guess a whole meta-genome shotgun sequencing project is still costly, now you (or the head of your lab) have somewhat committed to invest in the appropriate infrastructure to analyse your data.