Hi everyone!
I am looking to taxonomically annotate a fasta sequence file and receive a fasta output with annotation. The original pacbio_otu.fasta has the id lines:
> consensus=Uniq2;size=24;seqs=2
GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG
To annotate pacbio_otu.fasta, the taxonomy database rdp_16s_v16_sp.fa has the id lines:
> EF599163_S000871589;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae
GTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGAAACGACACTAACAATCCTTC
If possible, I would like to have taxonomy annotation (from rdp_16s_v16_sp.fa) on my pacbio_otu.fasta file to build my own taxonomy database in fasta format with the id lines like:
> consensus=Uniq2;size=24;seqs=2;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae
GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG
Eventually, with this taxonomy database in fasta format, I would like to run usearch 'sintax' with other fasta data against it.
For my situation, are there any ways or scripts to produce my own taxonomy database in fasta format?
Many thanks, Zach
Hi Zach,
A fasta file is a file with one header line, that starts with the sign
>
, followed by a sequence (DNA, RNA, protein), such as:>OTU_1
ATCGATGCTAGCTACGATCGATCAGCTAGCTGATCGATCGATGCATCGATC
Therefore the two header file that you're requesting is not in fasta format, because you have: 1st line - header, 2nd line - taxonomy, and 3rd line - sequence. Thus, even if you create that strange format
usearch
will probably complain and throw you errors saying that your data is not in fasta format.You have two options here: (1) stick with the file annotated like
Or (2) keep a fasta file untouch and the taxonomy in a text file with 2 columns that match headers and taxonomy.
I hope this help,
António
Do you know what this
EF599163_S000871589
means or come from?My guess is that you should have a file from
usearch
matchingEF599163_S000871589
withUniq2
, but I'm not sure. I don't useusearch
for a long time.António
Hi Antonio,
Thanks for your response. I have previously done usearch-sintax with other fasta files on rdp_16s_v16_sp.fa as a database, without any problems.
What I want to do is annotate a PacBio fasta file of mine (pacbio_otu.fasta) to get a new taxonomy-annotated fasta file with lines like this:
I do not have an annotated fasta file like the above, and am looking to have that.
'EF599163_S000871589' should represent a particular OTU. The RDP taxonomy database (rdp_16s_v16_sp.fa) was obtained from https://drive5.com/usearch/manual/sintax_downloads.html
Cheers!