Hi all, I have some results retrieved from ProDom database and I need to analyze the output from 3 inputs to check if they share the same conserved domains, the results points to same description of 'ribonucleoprotein', then intuitively we can conclude that they share same domains, but I need to automatize this process in a script writing some algorithm.
The inputs are:
#QUERY:
>Contig4 60S ribosomal protein L6
CGGCACGAGGCCCAGACGACGACACCAAAACGAAACACCCACCCGCTCCCTCCGCTGAAA
ACTTGGACCTCGCACTCCTACTTTTCTTCTTTTCCACATCTTCGTCACAATAGCCAATAT
GTCGGCGACTTTGGACTCAAAGTCTTTTGGCCAGACGAAGAAGTTTGGAAAGGGCGAGAG
GACCATCCCCAGCCAAAAGGCTTCAAAGTGGTATCCCACTGAGGACGAGCCACAGCCAAA
GAAAGTCCGCAAGACTATCCACCCAGCGAAGCCCCGGGCCAGTCTTCAACCGGGCACCAT
CCTCATCCTCCTCGCTGGTCGCTTCCGAGGCAAACGTGTTGTGCTCCTTAAGCACCTCCA
CCAAGGTGTTCTCCTCGTTACCGGTCCTTTCAAACTCAATGGCGTCCCTCTTCGGAGAGT
AAACGCCAGATACGTCATTGCCACCAGCTCGAAGGTGGACCTCACGGGAATTGACGACAA
GGTTCTGGAGAAAGCCTCGGAATCCGAGTACTTCACTCGTGAGAAGAAGGCCGAGAAGAA
GGGAGAGGAAGCTTTCTTCAAACAAGGAGAGAAGCCAGAGAAGAAGAAGGTCGTCAGCGC
CCGTGCCAACGACCAAAAGGCCATAGATCGGCCATTGTTGGCCACCATTAAGAAGGAGCA
GTTCCTCGCCAGCTACCTCTCCACCAGCTTCAGCCTCCGGAAAGGCGACAAGCCTCATGA
AATGAAGTGGTAAAACCGGCGGGTTGTCTACGTTGATGGCATAGAAAGAATGGGGATGTT
TTTTCCTTATTTTCAATTGTTTTGTTTCTTGGACGGGTCCAACGGGGGAAATTTTCTTTT
CAGGAAATAAATGGAAAAAAACATTAAAATATACAAACACTTCCAAAGGTCCTTTTTCAA
TGAAATAGTGTACATGGATTGGCCGATATTTCCCCTCGACTGATAATTTCAACGAAAACC
CTTCTAAATTCCCCACCTTTTCCGCAGG
NR-fungi: http://www.ncbi.nlm.nih.gov/protein/295667850?report=fasta
Swissprot: http://www.uniprot.org/uniprot/P05739.fasta
The results produced are:
Query:
NR-Fungi:
Swissprot:
The online results produced the following window arrangement of domain families:
"The following is the graphical representation of the HSP found by BLAST. Please note that HSPs are sorted from highest to lowest scores, so that lower scoring HSPs may be hidden."
Query: [Length: 988]
Position ProDom domain Score E value
119-277 +2 #PDA1O244 283 4e-24
281-520 +2 #PD006079 307 6e-27
242-697 +2 #PDA5M562 149 1e-08
578-730 +2 #PDB001A6 258 3e-21
NR-Fungi: [Length: 204]
Position ProDom domain Score E value
1-53 #PDA1O244 276 2e-23
55-134 #PD006079 321 1e-28
154-204 #PDB001A6 263 7e-22
Swissprot: [Length: 176]
Position ProDom domain Score E value
3-31 #PDA1O244 148 1e-08
20-89 #PDA5M562 187 4e-13
33-107 #PD006079 275 3e-23
58-122 #PD605103 258 2e-21
26-176 #PDB0T0G5 140 1e-07
126-175 #PDB001A6 248 4e-20
Looking for ProDom acession, we can arrange the family of domains along the length of the three inputs using the align start and align end information.
Putting the domains in order from the best score to lower score we can see the complete window for all these 3 results. Even so, I put the results above showing the relative positioning at begin and end with the input. It's showed in the figure below:
For all these results, I can infer that the correct order are: [PDA1O244, PD006079, PDB001A6] and conclude that they share the same conserved domains, but in practice, it couldn't be inferred this way.
Then, I need to automatize this process to check if the query share the same conserved domains to an optimal alignment retrieved from blast databases, but I don't know how to correctly analyze these results.