Hello,
I have a Illumnia generated fasta file that was created using rpoB sequence data. I am looking to BLAST these sequences in order to find the best 3 hits of what organism I am looking at. I know you can do something similar to this in QIIME, using BLASTALL, but with a 16S reference database.
Does anyone know of a rpoB database or a way that I could accurately identify the sequences in my FASTA file?
Any help or advice would be greatly appreciated.
Best Regards, Paul
Hi Paul, You should add this information to your question above or comment on my answer, as this is not an answer and it will help others with similar questions if they can follow your thread.
Sounds like you have good results clustering your OTUs. Excellent, half of the analysis is finished.
As far as identifying your OTUs... You need a sequence database with a corresponding taxonomy database for QIIME. If you're not able to access any previously published rpoB sequences, you can do as Michael mentioned above and create a sequence database from NCBI, EMBL, etc. This is easy to do. In addition to your curated rpoB sequences you'll need to parse the sequence taxonomy (the names of the corresponding organisms) from NCBI to a separate taxonomy file for QIIME. If you're unsure of the text format, look at the existing databases in QIIME and make sure your text files are in the same format. Then at the command line substitute your database for rpoB instead of the QIIME database you would normally use (GreenGenes, RPD database, etc.). You will then get a taxonomy identification which you can map on a phylogenetic tree (TopiaryExplorer) or use sequence divergence to look at alpha and beta diversity in your samples (UniFrac).
Let me know if you have any other questions.