Proteins db vs. nucleotides db
1
Hi,
Very basic question, forgive me if it might seem too naive.
In whole genome sequencing and metagenomics does it make a difference (for example in terms of accuracy of the results) doing a classification against a proteins database, like nr, compared to a nucleotides database, like nt?
Thanks
classification
database
metagenomics
• 871 views
Difficult to answer without more details. If you are lucky enough that your sequences are already similar to genomes that are already out there in databases, you may not need either nt
or nr
. Instead, you may want to try sendsketch.sh
from the BBTools package . It will remotely search your sequence in 5-10 seconds against either nucleotide or protein databases, and give you a very quick answer.
Here is an example for one of my metagenomic bins versus the nucleotide database:
Query: group_045.fa DB: nt SketchLen: 9793 Seqs: 530 Bases: 3608495 gSize: 3143572 GC: 0.305 File: group_045.fa
WKID KID ANI SSU Complt Contam Matches Unique TaxID gSize gSeqs taxName
56.25% 28.50% 97.93% 84.42% 100.00% 0.46% 2791 2757 795359 1612118 4 Thermodesulfobacterium geofontis OPF15
0.22% 0.12% 80.08% 83.35% 100.00% 28.84% 12 5 2234087 1746157 2 Thermodesulfobacterium sp. TA1
0.15% 0.08% 78.74% 83.53% 100.00% 28.88% 8 0 289377 1703141 5 Thermodesulfobacterium commune DSM 2178
And versus the protein database:
Query: group_045.fa DB: ProkProt SketchLen: 17939 Seqs: 3586 SeqLen: 1090784 gSize: 705574 File: group_045.fa
WKID KID AAI SSU Complt Contam Matches Unique TaxID gSize gSeqs taxName
76.12% 50.00% 97.39% 84.35% 100.00% 7.07% 8970 6334 795359 459769 1511 Thermodesulfobacterium geofontis OPF15
10.26% 7.01% 80.26% 83.87% 73.09% 50.06% 1257 74 161156 482978 1548 Thermodesulfobacterium hydrogeniphilum
7.64% 5.56% 77.99% 83.28% 66.59% 51.51% 998 45 2234087 510876 1642 Thermodesulfobacterium sp. TA1
7.83% 5.44% 78.19% 83.48% 69.63% 51.63% 976 14 289377 486048 3236 Thermodesulfobacterium commune DSM 2178
7.43% 5.15% 77.76% 82.75% 69.39% 51.92% 924 6 1123372 484331 1613 Thermodesulfobacterium hveragerdense DSM 12571
7.25% 5.20% 77.58% 82.92% 67.12% 51.87% 933 4 1123373 500560 1645 Thermodesulfobacterium thermophilum DSM 1276
4.87% 3.50% 74.65% 82.87% 64.72% 53.58% 627 20 1653476 503318 1637 Caldimicrobium thiodismutans
1.74% 1.51% 67.60% 82.88% 51.44% 55.57% 270 6 999894 606333 2026 Thermosulfurimonas dismutans
1.36% 1.22% 66.02% 81.94% 49.56% 55.86% 218 0 667014 627712 2121 Thermodesulfatator indicus DSM 15286
1.36% 1.20% 66.00% 81.76% 49.74% 55.87% 216 0 1795632 627034 2097 Thermodesulfatator autotrophicus
1.28% 1.19% 65.61% 82.06% 47.49% 55.88% 214 0 1123371 656772 2128 Thermodesulfatator atlanticus DSM 21156
0.79% 0.60% 62.61% 81.40% 57.30% 56.47% 107 17 1871110 535441 1838 Thermodesulfovibrio sp. N1
0.80% 0.64% 62.67% 81.74% 55.00% 56.44% 114 4 86166 558478 1839 Thermodesulfovibrio aggregans
0.67% 0.55% 61.54% 81.32% 52.43% 56.52% 99 2 1123375 584489 1928 Thermodesulfovibrio islandicus DSM 12570
0.67% 0.54% 61.62% 81.25% 54.74% 56.54% 96 0 2580394 557606 1808 Thermodesulfovibrio sp. Kuro-1
0.67% 0.54% 61.59% 81.37% 54.38% 56.54% 96 0 289376 561402 1876 Thermodesulfovibrio yellowstonii DSM 11347
0.64% 0.48% 61.33% 80.65% 56.95% 56.59% 87 1 1123376 535566 1738 Thermodesulfovibrio thiophilus DSM 17215
0.42% 0.40% 58.86% 79.21% 44.94% 56.67% 72 0 1156395 680300 2090 Dissulfuribacter thermophilus
0.31% 0.27% 57.19% 81.34% 38.17% 56.74% 51 0 39841 800456 2579 Thermodesulforhabdus norvegica
0.28% 0.26% 56.50% 79.35% 45.96% 56.81% 47 1 1621989 663655 2241 Candidatus Desulfofervidus auxilii
Login before adding your answer.
Traffic: 2128 users visited in the last hour
Thanks for your reply.
I have found the answer to my question here, where it is very nicely explained https://www.nature.com/articles/ncomms11257