Question

How to analyze the ProDom output? I have some results to support the analysis, but I don't have the answers.

0

Entering edit mode

10.8 years ago

dssouzadan ▴ 30

Hi all, I have some results retrieved from ProDom database and I need to analyze the output from 3 inputs to check if they share the same conserved domains, the results points to same description of 'ribonucleoprotein', then intuitively we can conclude that they share same domains, but I need to automatize this process in a script writing some algorithm.

The inputs are:

#QUERY:
>Contig4  60S ribosomal protein L6
CGGCACGAGGCCCAGACGACGACACCAAAACGAAACACCCACCCGCTCCCTCCGCTGAAA
ACTTGGACCTCGCACTCCTACTTTTCTTCTTTTCCACATCTTCGTCACAATAGCCAATAT
GTCGGCGACTTTGGACTCAAAGTCTTTTGGCCAGACGAAGAAGTTTGGAAAGGGCGAGAG
GACCATCCCCAGCCAAAAGGCTTCAAAGTGGTATCCCACTGAGGACGAGCCACAGCCAAA
GAAAGTCCGCAAGACTATCCACCCAGCGAAGCCCCGGGCCAGTCTTCAACCGGGCACCAT
CCTCATCCTCCTCGCTGGTCGCTTCCGAGGCAAACGTGTTGTGCTCCTTAAGCACCTCCA
CCAAGGTGTTCTCCTCGTTACCGGTCCTTTCAAACTCAATGGCGTCCCTCTTCGGAGAGT
AAACGCCAGATACGTCATTGCCACCAGCTCGAAGGTGGACCTCACGGGAATTGACGACAA
GGTTCTGGAGAAAGCCTCGGAATCCGAGTACTTCACTCGTGAGAAGAAGGCCGAGAAGAA
GGGAGAGGAAGCTTTCTTCAAACAAGGAGAGAAGCCAGAGAAGAAGAAGGTCGTCAGCGC
CCGTGCCAACGACCAAAAGGCCATAGATCGGCCATTGTTGGCCACCATTAAGAAGGAGCA
GTTCCTCGCCAGCTACCTCTCCACCAGCTTCAGCCTCCGGAAAGGCGACAAGCCTCATGA
AATGAAGTGGTAAAACCGGCGGGTTGTCTACGTTGATGGCATAGAAAGAATGGGGATGTT
TTTTCCTTATTTTCAATTGTTTTGTTTCTTGGACGGGTCCAACGGGGGAAATTTTCTTTT
CAGGAAATAAATGGAAAAAAACATTAAAATATACAAACACTTCCAAAGGTCCTTTTTCAA
TGAAATAGTGTACATGGATTGGCCGATATTTCCCCTCGACTGATAATTTCAACGAAAACC
CTTCTAAATTCCCCACCTTTTCCGCAGG

NR-fungi: http://www.ncbi.nlm.nih.gov/protein/295667850?report=fasta

Swissprot: http://www.uniprot.org/uniprot/P05739.fasta

The results produced are:

Query:

NR-Fungi:

Swissprot:

The online results produced the following window arrangement of domain families:

"The following is the graphical representation of the HSP found by BLAST. Please note that HSPs are sorted from highest to lowest scores, so that lower scoring HSPs may be hidden."

Query: [Length: 988]

Position       ProDom domain   Score E value
 119-277   +2  #PDA1O244         283 4e-24
 281-520   +2  #PD006079         307 6e-27
 242-697   +2  #PDA5M562         149 1e-08
 578-730   +2  #PDB001A6         258 3e-21

NR-Fungi: [Length: 204]

Position  ProDom domain   Score E value
    1-53  #PDA1O244         276 2e-23
  55-134  #PD006079         321 1e-28
 154-204  #PDB001A6         263 7e-22

Swissprot: [Length: 176]

Position  ProDom domain   Score E value 
    3-31  #PDA1O244        148 1e-08
   20-89  #PDA5M562        187 4e-13
  33-107  #PD006079        275 3e-23
  58-122  #PD605103        258 2e-21
  26-176  #PDB0T0G5        140 1e-07
 126-175  #PDB001A6        248 4e-20

Looking for ProDom acession, we can arrange the family of domains along the length of the three inputs using the align start and align end information.

Putting the domains in order from the best score to lower score we can see the complete window for all these 3 results. Even so, I put the results above showing the relative positioning at begin and end with the input. It's showed in the figure below:

For all these results, I can infer that the correct order are: [PDA1O244, PD006079, PDB001A6] and conclude that they share the same conserved domains, but in practice, it couldn't be inferred this way.

Then, I need to automatize this process to check if the query share the same conserved domains to an optimal alignment retrieved from blast databases, but I don't know how to correctly analyze these results.

blast algorithm ProDom domains alignment • 2.3k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.8 years ago by dssouzadan ▴ 30