Hi Bioinformatic community,
I would like to classify 18S data (V7) of Fungi with assignTaxonomy from dada2. For that I downloaded SILVA_132_SSURef_tax_silva.fasta.gz from the SILVA website and need to format it, what I do with some Linux command line oneliner.
But some species in the database have a different number of taxonomic levels than other species in the database. My question is, how I can format it in (preferably with Linux command line) in a way that only 6 taxonomic levels are left. How can I know for every entry in the database, which taxonomic levels there are present, so some can be cut out to get a consistent number of levels? How can I maybe use the tax_slv_ssu_132.txt file to know where to cut the levels out?
This is the fasta format of the taxonomy file I would like to change the SILVA database file into:
Level1;Level2;Level3;Level4;Level5;Level6; ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC Level1;Level2;Level3;Level4;Level5; CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
This is the beginning of the SILVA database fasta file: $ less SILVA_132_SSURef_tax_silva.fasta
GY187501.2.1421 Bacteria;Epsilonbacteraeota;Campylobacteria;Campylobacterales;Helicobacteraceae;Helicobacter;unidentified GCAAGUCGAACGAUGAAGCCUAGCUUGCUAGGUUGAUUAGUGGCGCACGGGUGAGUAAUGCAUAGAUGACAUGCCCUUUA GUUUGGAAUAGCCACUAGAAAUGGUGAUUAAUACCAAAUACUACCUUACGAGGGAAAGAUUUAUCGCUAAAGGAUUGGUC UAUGUCCUAUCAGCUUGUUGGUGAGGUAAAGGCUCACCAAGGCUAUGACGGGUAUCCGGCCUGAGAGGGUGAGCGGACAC ACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUAUUGCUCAAUGGGGGAAACCCUGAAGCAG CAACGCCGCGUGGAGGAUGAAGGUUUUAGGAUCGUAAACUCCUUUUGUUAGAGAAGAUAAUGACGGUAUCUAACGAAUAA GCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACUCGGAAUCACUGGGCGUAAAGAGUG CGUAGGCGGGAGGACAAGUCAGGUGUGAAAUCCUAUGGCUUAACCAUAGAACUGCAUUUGAAACUAUCCUUCUGGAGUGU GGGAGAGGUAGGUGGAAUUCUUGGUGUAGGGGUAAAAUCCGUAGAGAUCAAGAGGAAUACUCAUUGCGAAGGCGACCUGC UGGAACAUCACUGACGCUGAUUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACG AUGGAUGCUAGUUGUUGGGAGGCUUUGUCUUUCCAGUAAUGCAGCUAACGCCUUAAGCAUCCCGCCUGGGGAGUACGGUC GCAAGAUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGUUACACGAAGA ACCUUACCUAGGCUUGACAUUGAAGGAAUUCCCUAGAAAUAGGGGAGUGUCUAGCUUGCUAGACCCUGAAAACAGGUGCU GCACGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUUUUCUUAGUUGCUAAC AGGUUAUGCUGCGCACUCUAAGAAGACUGCCUGCGUAAGCAGGAGGAAGGUGAGGACGACGUCAAGUCAUCAUGGCCCUU ACGCCUAGGGCUACACACGUGCUACAAUGGGGUGCACAAAGAGAUGCAAAGCCGCGAGGCAGAGCUAAUCUAUAAAACAC CUCCUAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGCUGGAAUCGCUAGUAAUCGCAAAUCAGCUAUGUUGCGGU GAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGUGUUUGCCUUAAGUCAGGAUGCUAAAGCAG CUACUGCCCACGGCACACACAGCGACUGGGGUGAAGUCGUAACAAGGUAACCCGGGCGGC GY194060.4884.6412 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;unidentified AGAGUUUGAUCCUGGCUCAGGACGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUAGAACGCUGAGGUUUGGUGUUUACA CUAGACUGAUGAGUUGCGAACGGGUGAGUAACGCGUAGGUAACCUGCCUCAUAGCGGGGGAUAACUAUUGGAAACGAUAG CUAAUACCGCAUAAGAGUAAUUAACACAUGUUAGUUAUUUAAAAGGAGCAAUUGCUUCACUGUGAGAUGGACCUGCGUUG UAUUAGCUAGUUGGUGAGGUAAAGGCUCACCAAGGCGACGAUACAUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGGA CUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCGGCAAUGGACGGAAGUCUGACCGAGCAACGCC GCGUGAGUGAAGAAGGUUUUCGGAUCGUAAAGCUCUGUUGUUAGAGAAGAACGUUGGUAGGAGUGGAAAAUCUACCAAGU GACGGUAACUAACCAGAAAGGGACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUCCCGAGCGUUGUCCGGAU UUAUUGGGCGUAAAGCGAGCGCAGGCGGUUCUUUAAGUCUGAAGUUAAAGGCAGUGGCUUAACCAUUGUACGCUUUGGAA ACUGGAGGACUUGAGUGCAGAAGGGGAGAGUGGAAUUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAGGAACACCG GUGGCGAAAGCGGCUCUCUGGUCUGUAACUGACGCUGAGGCUCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGU AGUCCACGCCGUAAACGAUGAGUGCUAGGUGUUAGGCCCUUUCCGGGGCUUAGUGCCGCAGCUAACGCAUUAAGCACUCC GCCUGGGGAGUACGACCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAU UCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCCUUCUGACCGGCCUAGAGAUAGGCUUUCUCUUCGGAGCAGAA GUGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUAUUG UUAGUUGCCAUCAUUAAGUUGGGCACUCUAGCGAGACUGCCGGUAAUAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCA UCAUGCCCCUUAUGACCUGGGCUACACACGUGCUACAAUGGUUGGUACAACGAGUCGCAAGCCGGUGACGGCAAGCUAAU CUCUUAAAGCCAAUCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUACAUGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAG CACGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGAGAGUUUGUAACACCCGAAGUCGGUG AGGUAACCUUUUAGGAGCCAGCCGCCUAAGGUGGGAUAGAUGAUUGGGGUGAAGUCGUAACAAGGUAGCCGUAUCGGAAG GUGCGGCUG
This is the beginning of the tax_slv_ssu_132.txt file from SILVA $ less tax_slv_ssu_132.txt Archaea; 2 domain Archaea;Altiarchaeota; 24228 phylum 132 Archaea;Altiarchaeota;Altiarchaeia; 24229 class 132 Archaea;Asgardaeota; 24230 phylum 132 Archaea;Asgardaeota;Heimdallarchaeia; 24231 class 132 Archaea;Asgardaeota;Lokiarchaeia; 24232 class 132 Archaea;Asgardaeota;Odinarchaeia; 24233 class 132 Archaea;Crenarchaeota; 6 phylum 119 Archaea;Crenarchaeota;Bathyarchaeia; 24234 class 132 Archaea;Crenarchaeota;Crenarchaeota Incertae Sedis; 24235 class 132 Archaea;Crenarchaeota;Crenarchaeota Incertae Sedis;Aigarchaeales; 24236 order 132 Archaea;Crenarchaeota;Crenarchaeota Incertae Sedis;Aigarchaeales;Caldiarchaeaceae; 24237 family 132 Archaea;Crenarchaeota;Crenarchaeota Incertae Sedis;Aigarchaeales;Caldiarchaeaceae;Candidatus Caldiarchaeum; 24238 genus 132
When I make the sequence in the database in one line instead over several lines:
$ awk '/^>/ { print (NR==1 ? "" : RS) $0; next } { printf "%s", $0 } END { printf RS }' SILVA_132_SSURef_tax_silva.fasta > SILVA_132_SSURef_tax_silva_oneline.fasta then run: $ grep -A1 'Eukaryota' SILVA_132_SSURef_tax_silva_oneline.fasta --no-group-separator |sed 's/ /_/g' |sed 's/>_/> /g' | cut -f 1,3 -d " " | sed -E 's/[[:digit:]]_/ /g' |cut -f 2 -d " "|sed '1~2 s/^/>/g'| awk '/^>/ {$0=$0 ";"}1'> SILVA_132_taxonomy_assignment.fasta
And then translate all Us to Ts, I get near to the format I want to:
awk 'BEGIN{RS=">";FS="\n"}NR>1{printf ">%s\n",$1;for (i=2;i<=NF;i++) {gsub(/U/,"T",$i); printf "%s\n",$i}}' SILVA_132_taxonomy_assignment.fasta| awk 'NF' > SILVA_132_taxonomy_assignment_formatted.fasta
My formatted file looks like this: $ tail SILVA_132_taxonomy_assignment_formatted.fasta
Eukaryota;Opisthokonta;Nucletmycea;Fungi;Cryptomycota;Incertae_Sedis;Incertae_Sedis;Incertae_Sedis;Incertae_Sedis;Paramicrosporidium;uncultured_fungus; CTAGAGCTAATACATGCAAAGAGCCCGACTTGTGGAAGGGCTGTATTTATTAGATAAAAAACCAGGGCCAGAAATGGCAGATGGGGAGTCATAATAACTGAGCGGATCGCAGGGCCTTGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATCGGGGCCTACCATGGTTTTGACGGGTAACGGGGAATTGGGGTTCGATTCCGGAGAGGGAGCCTGAGAGATGGCTACCACTTCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGACACAGGGAGGTAGTGACAGGAAATAACAATGCAGGGCCCTCTGGGTCTTGCAATTGGAATGAGAAGAATATAAAACGCTCATCGAGCAGCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAAGGTGTGTGTATTGGGATTGTTGGCTTTGCGGCTGTACAATGACAAGCACAATAGGGAGGAGTGCACTTGTGCAGTTCAGCCCGGGTTACTGTGAAGAAATTAGAGTGTTTAAAGCAGGCATAGGCTTGGATACATCAGCATGGAATAACAAGAGGGACTAGGGACCTTTTGTTGGTTTGGGGGCTTTGGTAATGATTAATAGGAACAGTTGGGGGCATTAGTATTCAGCAGTCAGAGGTGAAATTCTTGGATTTGGTGAAGACTAACTAGTGCGAAGGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTAACCATAAACGATGCCGACTCGGGATCGGTGGGGGTCTGATGACACCATCGGGACCGTGTGAGAAATCAAAGTTTTTGGGTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCAGGTCCGGACATAGTAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCTGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGATAACGAACGAGACCTTAACCTGCTAACTAGTTGGCAGAGTGAGAACTCTGCGGGACTTCTTAGAGGGACTGACTGCGGATAGCAGTCGGAAGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCTGCACGCGCGCTACACTGACACAGGCAGCGAGTCTTGCCTTGGCCGAGAGGTCTGGGGAATCTTGGGAAACTGTGTCGTGCTGGGGATAGATCATTGCAATTATTGGTCTTGAACGAGGAATTCCTAGTAAGCGCAAGTCATCAGCTTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAATGGCTTAGTGAGGCGTTTGGAGTGGTGGTTTTGGCTTTGGTTGGAACTAGTGCAAAATCCGTCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGA Eukaryota;Archaeplastida;Chloroplastida;Charophyta;Phragmoplastophyta;Streptophyta;Embryophyta;Tracheophyta;Spermatophyta;Magnoliophyta;Liliopsida;Poales;Triticum;Triticum_aestivum_(bread_wheat); CCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGCAAGTATGAACCAATTTGAACTNNNNNNNNNNGAATGGCTCATTAAATCAGTTATAGTTTGTTTGATGGTACGTGCTACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAACAAACCCCGACTTCTGGGAGGGGCGCATTTATTAGATAAAAGGCTGACGCGGGCTCTGCTCGCTGATCCGATGATTCATGATAACTCGACGGATCGCACGGCCTTCGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGGGGCCTACCATGGTGGTGACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGACACGGGGAGGTAGTGACAATAAATAACAATACCGGGCGCATTAGTGTCTGGTAATTGGAATGAGTACAATCTAAATCCCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGACCTTGGGCCGGGTCGGCCGGTCCGCCTCACGGCGAGCACCGACCTACTCGACCCTTCGGCCGGCATCGCGCTCCTAGCCTTAATTGGCCGGGTCGTGTTTCCGGCATCGTTACTTTGAAGAAATTAGAGTGCTCAAAGCAAGCCATCGCTCTGGATACATTAGCATGGGATAACATCATAGGATTCCGGTCCTATTGTGTTGGCCTTCGGGATCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTATTTCATAGTCAGAGGTGAAATTCTTGGATTTATGAAAGACGAACAACTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCTCGAAGACGATCAGATACCGTCCTAGTCTCAACCATAAACGATGCCGACCAGGGATCGGCGGATGTTGCTTATAGGACTCCGCCGGCACCTTATGAGAAATCAAAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGCGTGGAGCCTGCGGCTTAATTTGACTCAACAC Eukaryota;Archaeplastida;Chloroplastida;Charophyta;Phragmoplastophyta;Streptophyta;Embryophyta;Tracheophyta;Spermatophyta;Magnoliophyta;Liliopsida;Poales;Triticum;Triticum_aestivum_(bread_wheat); CCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGCAAGTATGAACCAATTTGAACTGTGAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTGTTTGATGGTACGTGCTACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAACAAACCCCGACTTCTGGGAGGGGCGCATTTATTAGATAAAAGGCTGACGCGGGCTCTGCTCGCTGATCCGATGATTCATGATAACTCGACGGATCGCACGGCCTTCGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGGGGCCTACCATGGTGGTGACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGACACGGGGAGGTAGTGACAATAAATAACAATACCGGGCGCATTAGTGTCTGGTAATTGGAATGAGTACAATCTAAATCCCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGACCTTGGGCCGGGTCGGCCGGTCCGCCTCACGGCGAGCACCGACCTACTCGACCCTTCGGCCGGCATCGCGCTCCTAGCCTTAATTGGCCGGGTCGTGTTTCCGGCATCGTTACTTTGAAGAAATTAGAGTGCTCAAAGCAAGCCATCGCTCTGGATACATTAGCATGGGATAACATCATAGGATTCCGGTCCTATTGTGTTGGCCTTCGGGATCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTATTTCATAGTCAGAGGTGAAATTCTTGGATTTATGAAAGACGAACAACTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCTCGAAGACGATCAGATACCGTCCTAGTCTCAACCATAAACGATGCCGACCAGGGATCGGCGGATGTTGCTTATAGGACTCCGCCGGCACCTTATGAGAAATCAAAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGCGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGGAAACTTACCAGGT Eukaryota;Archaeplastida;Chloroplastida;Charophyta;Phragmoplastophyta;Streptophyta;Embryophyta;Tracheophyta;Spermatophyta;Magnoliophyta;Liliopsida;Poales;Triticum;Triticum_aestivum_(bread_wheat); CCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGCAAGTATGAACCAATTTGAACTGTGAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTGTTTGATGGTACGTGCTACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAACAAACCCCGACTTCTGGGAGGGGCGCATTTATTAGATAAAAGGCTGACGCGGGCTCTGCTCGCTGATCCGATGATTCATGATAACTCGACGGATCGCACGGCCTTCGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGGGGCCTACCATGGTGGTGACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGACACGGGGAGGTAGTGACAATAAATAACAATACCGGGCGCATTAGTGTCTGGTAATTGGAATGAGTACAATCTAAATCCCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGACCTTGGGCCGGGTCGGCCGGTCCGCCTCACGGCGAGCACCGACCTACTCGACCCTTCGGCCGGCATCGCGCTCCTAGCCTTAATTGGCCGGGTCGTGTTTCCGGCATCGTTACTTTGAAGAAATTAGAGTGCTCAAAGCAAGCCATCGCTCTGGATACATTAGCATGGGATAACATCATAGGATTCCGGTCCTATTGTGTTGGCCTTCGGGATCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTATTTCATAGTCAGAGGTGAAATTCTTGGATTTATGAAAGACGAACAACTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCTCGAAGACGATCAGATACCGTCCTAGTCTCAACCATAAACGATGCCGACCAGGGATCGGCGGATGTTGCTTATAGGACTCCGCCGGCACCTTATGAGAAATCAAAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGCGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGGAAACTTACCAGGTCCAGACATAGCAAGGATTGACAGACTGAGAGCTCTTTCTTGATTCTATGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCCTGCTAACTAGCTATGCGGAGCCATCCCTCCGCAGCTAGCTTCTTAGAGGGACTATCGCCGTTTAGGCGACGGAAGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGATGTATTCAACGAGTATATAGCCTTGGCCGACAGGCCCGGGTAATCTTGGGAAATTTCATCGTGATGGGGATAGATCATTGCAATTGTTGGTCTTCAACGAGGAATGCCTAGTAAGCGCGAGTCATCAGCTCGCGTTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAATGGTCCGGTGAAGTGTTCGGATCGCGGCGACGGGGGCGGTTCGCCGCCCCCGACGTCGCGAGAAGTCCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAG
The problem is now, that I still have a inconsistent number of taxonomic levels in the taxonomy in my fasta header line. But I want 6 (or less), so I end up with the taxonomic levels: "Kingdom","Class","Order","Family","Genus","Species" To assign the taxonomy to my 18s data which looks like: $ tail seqtab.nochim.csv "TTTGTCTGGTTAATTCCGATAACGAGCGAGACCTCCATGGTGCAATGGGCTGTTTTGTTGGTACAGAGCAGCTCTCTTCACCAGACTAGCCATGTTTAGTGGCAGGAAGCTGGAGGCAATAACAGGTCTGTGA" "TTTGTCTGGTTAATTCCGATAACGAACGAGACCTCTACATACTAAATAGATCCGTAAATCTTCTAAAGGGTGAAGTTGTGAGACAGTACTAGTTCTCTAGTGCTTCATGCAATGGATCCCAGATGAATTTACGAGATAATATAGATTTTCTTAGATGGACTGTTTGCGAAAGTAAACGGAAGTAAGAGGCAATAACAGGTCTGTGA"
I hope this helps to clarify what I want to do and already did.
Thankful for any suggestion! Maria
Hi!
Please share first with us an example how the data looks like, the command-line you're running and the problem that you're getting. If you show us what you've tried and what it is wrong, it becomes easier for the community to try to help you.
Best,
António
Please, whenever you need to add more information, just edit your original post and do not add answers below unless you found the solution to your problem. I tried to edit your original post to include the information you gave below.
I was checking DADA2 taxonomy website and they include a 132 version. Does this version does not have fungi sequences? (see links below)
DADA2 taxonomy
SILVA v132 DADA2 formatted data base
António
Thank you for editing the post and for your answer.
Sadly, the linked formatted database does not contain Fungi, e.g. when I grep 'Fungi', nothing comes up, also not in the species assignment file.
I have to say, I also want to assign species. But the formatting I did worked for me.
mbuenner : Please take a moment to properly format the original post. Highlighting text and then using
101010
code button allows you to keep monospace fonts and formatting.Quote
button should only be used for quoting text in replies etc.You do not need to include entire sequences, those can be shortened to make the post concise. I formatted the actual code portion of your posts but did not touch the rest since I am not sure exactly what it needs to look like.