How do I get a single nucleotide from my BLAST alignment?
1
1
Entering edit mode
5.8 years ago
suvratha ▴ 70

Hello, I've my BLAST alignment as below:

Query  1        AGTAAAGCCGACTCGGCTATCCATGGGTGAGAACCTAAAGCCGAGTCGGCTTTAAGTTCT  60
                |||||||||||||||||| |||||||||| ||||||||||||||||||||||||||||||
Sbjct  2913022  AGTAAAGCCGACTCGGCTGTCCATGGGTGGGAACCTAAAGCCGAGTCGGCTTTAAGTTCT  2913081

Query  61       GGAAAGTCCCATTTGTCCAGCAGGAAAAGCCGACTCGGCTTTCCTGGTGTTGGGGCAAAA  120
                 ||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||
Sbjct  2913082  AGAAAGTCCCATTTGTCCAGCAGGAAAAGCCGACTCCGCTTTCCTGGTGTTGGGGCAAAA  2913141

Query  121      GCCGACTCGGCTTTTTCCTCTGTTATGAGC**R**TTGGtttttttCCCGTTTTCTTTGAGTAA  180
                ||||||||||| |||||||||||||||||| ||||| ||||| |||||||||||||||||
Sbjct  2913142  GCCGACTCGGCCTTTTCCTCTGTTATGAGC**G**TTGGTCTTTTTTCCGTTTTCTTTGAGTAA  2913201

Query  181      TTGCTTTGGATTCTTTCACTTACGGTTCTTGATTTGTAGAGTTATAAGGGAGTATTAAGG  240
                |||||||||||||||||||||| || ||||||||||| ||||||||||||||||||||||
Sbjct  2913202  TTGCTTTGGATTCTTTCACTTATGGCTCTTGATTTGTGGAGTTATAAGGGAGTATTAAGG  2913261

Query  241      AGAATAATACTCATGAATGGCGTTGAATTGGATGATCATCAATATGATCATTAAGAGTGA  300
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2913262  AGAATAATACTCATGAATGGCGTTGAATTGGATGATCATCAATATGATCATTAAGAGTGA  2913321

Query  301      T  301
                |
Sbjct  2913322  T  2913322

I want the nucleotide thats exactly aligning with the letter R in the query sequence. As you can see the R aligns with a G in the database.

Is there a tool in Biopython or R that can help me code and get me the output?

I've 26101 such sequences where I need the nucleotide that matches with the letters W,R,K,Y,M,N,S in the query sequence. Manually browsing through the alignment and deciding the nucleotide thats matching with it will not cut it as you can see. Any help would be appreciated. Thanks!

alignment R sequence BLAST Biopython • 1.3k views
ADD COMMENT
3
Entering edit mode
5.8 years ago
5heikki 11k

Not the most beautiful solution..

awk '{if(/^Q/ || /^S/){print $3}}' input.file \
    | paste - - \
    | awk 'BEGIN{OFS=FS="\t"}{for(i=1;i<length($1);i++){L=substr($1,i,1); if(L!="A" && L!="T" && L!="G" && L!="C" && L!="a" && L!="t" && L!="g" && L!="c"){print substr($1,i,1),substr($2,i,1)}}}'

Output:

*   *
*   *
R   G
*   *
*   *
ADD COMMENT
0
Entering edit mode

Thanks a lot for this! there's a small issue with this, this code fails if there is a gap in the alignment. e.g - in the below alignment -

Query 1 TCATCAATGGAGGAAGGATAGTGTAAATGAGTTTTTGAAATTGAAAGGGTAATTCTTTTT 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 4648813 TCATCAATGGAGGAAGGATAGTGTAAATGAGTTTTTGAAATTGAAAGGGTAATTCTTTTT 4648754

Query 61 GAAGATGAAAGGG-TAGTTTTTATTATTTTGTGAAGCTGAATTaaaaaaaaaggtaaaaa 119 ||||||||||||| || |||||||||||| |||||||||||||||||||||||||||||| Sbjct 4648753 GAAGATGAAAGGGATA-TTTTTATTATTTGGTGAAGCTGAATTAAAAAAAAAGGTAAAAA 4648695

Query 120 aaaTTAATCACCAAAAAACTTGTTGTAGCCGRTCATTGAGGTTGTGGGTCAGTTTTGAGC 179 ||||||||||||||||||||||||||||||| ||| |||||||||||||||||||||||| Sbjct 4648694 AAATTAATCACCAAAAAACTTGTTGTAGCCGGTCAGTGAGGTTGTGGGTCAGTTTTGAGC 4648635

Query 180 AAATTCGATGAATATTTAGGATTATTAAAAAATAATCGATGGGTGGGATTATTTTTGGGC 239 |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct 4648634 AAATTCGATGAATATTTACGATTATTAAAAAATAATCGATGGGTGGGATTATTTTTGGGC 4648575

Query 240 AACGGATGATAGTTGGTATTATTCTTGGGAAACTTTCCTATAAACAAAAGGAATGGCGTT 299 |||||||||||||||||||||||||||| ||| ||||||||||||||||||||||||||| Sbjct 4648574 AACGGATGATAGTTGGTATTATTCTTGGCAAATTTTCCTATAAACAAAAGGAATGGCGTT 4648515

I'm supposed get the output as - R G

but instead the output i'm getting is - - A

this is the first instance where there is a gap in the alignment and its reporting just that instead of the letters R,W,K,Y,M,N,S.

ADD REPLY
0
Entering edit mode

Just change

..if(L!="A" && ..

into

..if(L!="-" && L!="A" && ..

ADD REPLY
0
Entering edit mode

Works wonders! just what i wanted! thank you!

ADD REPLY
0
Entering edit mode

NP, you could accept the answer then..

ADD REPLY

Login before adding your answer.

Traffic: 2098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6