how do KOG classification and KEGG analysis based on nucleotide sequences?
1
0
Entering edit mode
9.6 years ago
Kurban ▴ 230

Hello guys,

I have more than 400 up regulated transcripts sequences from my insect transcriptome data. If I wanna do KOG classification and KEGG analysis according to their sequences how should I do that? Could u give me some tips?

Thanks in advance

KOG-classification • 3.8k views
ADD COMMENT
0
Entering edit mode

Have you read this?

ADD REPLY
0
Entering edit mode

thanks @5heikki,

I downloaded Kog_LE.tar.gz file from ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ and did rpsblast:

rpsblast -i golf-4-down-and-DEseq-down-intersection.fasta -d Kog -m 2 -e 0.01 -a 3 -o golf-4-down-and-DEseq-down-intersection_Kog_results

and got the result:

kurban@kurban-X550VC:~/Desktop/kog_classification$ more golf-4-down-and-DEseq-down-intersection_Kog_resultsRPS-BLAST 2.2.26 [Sep-21-2011]
Database: KOG.v1.0
4825 sequences; 2,331,573 total letters
Searching..................................................done
Query= TR24847|c0_g1_i1
(6508 letters)
Score E
Sequences producing significant alignments: (bits) Value
gnl|CDD|231029 KOG3091, KOG3091, KOG3091, Nuclear pore complex, ... 53 1e-06
gnl|CDD|229161 KOG1219, KOG1219, KOG1219, Uncharacterized conser... 52 4e-06
gnl|CDD|230324 KOG2385, KOG2385, KOG2385, Uncharacterized conser... 43 0.002
gnl|CDD|231646 KOG3714, KOG3714, KOG3714, Meprin A metalloprotea... 40 0.009
1_0 70 TGATGCCAAAAGGCGATTGGGATCGTTAAAAGCCAATCTTTAATTCGTTTAGTTCAGAAA 129
231029 5 SGSTSgfAAGStGASGgqGNGTSttTSASGGGAfGSqpTTgTAT---TglfGANqAGGTG 62
\
|
F
231029 3 GASgSTSGfAAG---STgASgGqgNfGTSTTtSAsGGGAf 39
231029 3 GASG 6
1_0 130 TTCATTGTCCGGATTTCATTGCTTGGTCGGTGATGGGAATGACTTGCACAAGTGCTCACA 189
231029 63 fgtASTGTAAGSGfgTsTATGAggGGSqSGqqlgGlGG 102
\
|
FF
231029 40 gSqpTTGT----ATTglfgAN-qAGGTgfGT-AStGtAAgSGfgTStAtGAGgGffgSqS 93
231029 5 SGsTSGfAAGSTGASGGqGnfGTsTTtsA-----------S 34
231029 7 STsGfAAgs-tGASggqGNfGtSTttSASGgGAfGSqpTTGTATTGlf-----------G 54
1_0 190 AGTCGGGCAGCCTACTGGAAAAACCAGTGCACTTGTAATGATGATTTAGACTGAGGAGCC 249
231029 3 GASGST--SGfAAGsTgASGGqGnfGTSTTTSA--SGGGAf-- 39
231029 94 GqqlGGlgGt 103
--More--(0%)
kurban@kurban-X550VC:~/Desktop/kog_classification$ rpsblast -i golf-4-down-and-DEseq-down-intersection.fasta -d Kog -m 9 -e 0.01 -a 3 -o golf-4-down-and-DEseq-down-intersection_Kog_results
kurban@kurban-X550VC:~/Desktop/kog_classification$ more golf-4-down-and-DEseq-down-intersection_Kog_results# RPSBLAST 2.2.26 [Sep-21-2011]
# Query: TR24847|c0_g1_i1
# Database: Kog
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
TR24847|c0_g1_i1 gnl|CDD|231029 34.74 95 50 3 776 868 3 87 1e-06 52.5
TR24847|c0_g1_i1 gnl|CDD|231029 30.69 101 64 3 70 167 5 102 3e-06 51.4
TR24847|c0_g1_i1 gnl|CDD|231029 28.57 126 63 4 2286 2410 3 102 3e-06 51.4
TR24847|c0_g1_i1 gnl|CDD|231029 25.74 101 73 1 5481 5579 3 103 5e-06 50.6
TR24847|c0_g1_i1 gnl|CDD|231029 24.64 138 89 4 5230 5359 3 133 9e-06 49.8
TR24847|c0_g1_i1 gnl|CDD|231029 37.21 86 48 3 5470 5555 6 85 1e-05 49.5
TR24847|c0_g1_i1 gnl|CDD|231029 33.33 93 58 2 5460 5550 13 103 2e-05 48.7
TR24847|c0_g1_i1 gnl|CDD|231029 28.30 106 64 3 764 867 8 103 3e-05 48.3
TR24847|c0_g1_i1 gnl|CDD|231029 29.06 117 79 2 2616 2732 3 115 5e-05 47.5
TR24847|c0_g1_i1 gnl|CDD|231029 28.35 127 64 4 934 1060 3 102 6e-05 47.1
TR24847|c0_g1_i1 gnl|CDD|231029 33.75 80 49 1 378 457 11 86 8e-05 46.8
TR24847|c0_g1_i1 gnl|CDD|231029 24.82 137 76 3 2270 2406 3 112 1e-04 46.4
TR24847|c0_g1_i1 gnl|CDD|231029 31.96 97 64 2 3203 3298 12 107 1e-04 46.0
TR24847|c0_g1_i1 gnl|CDD|231029 29.66 118 66 6 207 324 3 103 2e-04 45.6
TR24847|c0_g1_i1 gnl|CDD|231029 24.59 122 71 2 5655 5776 3 103 2e-04 45.6
TR24847|c0_g1_i1 gnl|CDD|231029 18.78 197 138 5 888 1084 4 178 2e-04 45.2
TR24847|c0_g1_i1 gnl|CDD|231029 24.50 151 90 4 1974 2124 4 130 2e-04 45.2
TR24847|c0_g1_i1 gnl|CDD|231029 26.36 110 70 2 5904 6013 5 103 3e-04 44.8
TR24847|c0_g1_i1 gnl|CDD|231029 31.82 110 66 4 90 199 3 103 4e-04 44.4
TR24847|c0_g1_i1 gnl|CDD|231029 31.43 105 52 3 3275 3378 7 92 4e-04 44.4
TR24847|c0_g1_i1 gnl|CDD|231029 32.00 100 61 3 4949 5045 7 102 4e-04 44.4
TR24847|c0_g1_i1 gnl|CDD|231029 26.36 129 67 2 6230 6358 3 103 4e-04 44.4
TR24847|c0_g1_i1 gnl|CDD|231029 30.10 103 69 3 2647 2748 3 103 4e-04 44.4
TR24847|c0_g1_i1 gnl|CDD|231029 31.33 83 54 1 3566 3648 6 85 6e-04 44.1
TR24847|c0_g1_i1 gnl|CDD|231029 30.49 82 51 2 3790 3865 3 84 6e-04 44.1
TR24847|c0_g1_i1 gnl|CDD|231029 19.27 109 79 2 5823 5931 3 102 6e-04 43.7
TR24847|c0_g1_i1 gnl|CDD|231029 26.53 98 58 3 2265 2362 11 94 6e-04 43.7
TR24847|c0_g1_i1 gnl|CDD|231029 30.00 100 62 2 5267 5363 8 102 7e-04 43.7
TR24847|c0_g1_i1 gnl|CDD|231029 27.27 99 70 1 584 682 3 99 9e-04 43.3
TR24847|c0_g1_i1 gnl|CDD|231029 29.36 109 63 3 5588 5691 4 103 0.001 43.3
TR24847|c0_g1_i1 gnl|CDD|231029 23.16 95 69 2 335 427 11 103 0.001 42.9
TR24847|c0_g1_i1 gnl|CDD|231029 25.25 99 70 2 2087 2185 4 98 0.001 42.9
TR24847|c0_g1_i1 gnl|CDD|231029 27.84 97 59 2 2052 2140 10 103 0.001 42.9
TR24847|c0_g1_i1 gnl|CDD|231029 23.89 113 71 2 3458 3568 3 102 0.001 42.9
TR24847|c0_g1_i1 gnl|CDD|231029 25.22 115 69 2 5711 5825 6 103 0.001 42.5
TR24847|c0_g1_i1 gnl|CDD|231029 25.47 106 73 2 5192 5296 3 103 0.002 42.5
--More--(2%)

in query file, there are 60 sequences, but most of them aligned to gnl|CDD|231029 (Nuclear pore complex) with lowest e-value and highest score.my questions here are: did I do KOG classification correctly? if I did what is the proper interpretation of the result file? or is there something else I should do for making the rpsblast results little more understandable?

ADD REPLY
0
Entering edit mode

thanks @Michael Dondrup,

yeah, i did used the nucleotide sequences as query file and after i added the -p argument that worked fine.

hey Micheal, when you do Kog what e-value do u use to get an acceptable results?

ADD REPLY
0
Entering edit mode

1e-6 is often used, but these alignments can be short and bad, 1e-10 is a more conservative cutoff. There are many sequenced insects and Drosophila is a model organism. I'd expect to see a lot of hits with 1e-50 or better.

ADD REPLY
1
Entering edit mode
9.6 years ago
Michael 55k

It seems like you did an untranslated nucleotide search with rpsblast against a protein db, so your result is bogus. You are using legacy rpsblast which had an option:

 -p  Query sequence is protein  [T/F]  Optional
    default = T

Try to set this to F, then it might do the translation.

For Kegg analysis you should use KAAS which can handle nucleotide input.

ADD COMMENT

Login before adding your answer.

Traffic: 2166 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6