Hi All
I want to get the 16srRNA sequence of Staphylococcus aureus to do phylogeny along with other similar strains. Can anyone please help me to get the data? Is it available in NCBI or any other sources?
Hi All
I want to get the 16srRNA sequence of Staphylococcus aureus to do phylogeny along with other similar strains. Can anyone please help me to get the data? Is it available in NCBI or any other sources?
In most cases, it is difficult to confidently resolve classifications for 16S data at the species or sub-species level. So, if you were specifically interested in types of Staphylococcus aureus, it might be best to use a different marker (which you could define from comparing whole genome sequences from GenBank).
That said, there are a couple options to get relatively straight-forward taxonomy information for some reference 16S sequences:
1) The RDPclassifier provides the training data, where the taxonomy information is in the FASTA header: https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/. The main RDPclassifier .jar file also has rm-dupseq
and rm-partialseq
functions to filter for unique sequences.
2) NCBI provides some 16S sequences in a BLAST database. It takes a little extra work to extract the FASTA sequence from the database files (using blastdbcmd -entry all -db 16SMicrobial > NCBI_16S.fa
) and then download the taxonomy information, but BLCA provides some scripts for this pre-processing step: https://github.com/qunfengdong/BLCA
Last time I checked, the SILVA database had a different data format that was a little harder to parse, but taxonomy information is available in a format that can be used by mothur here: https://www.mothur.org/wiki/Silva_reference_files
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
just search and filter from genbank.
...I normally use Silva. Is Genbank better, in your opinion?