Forum:Computer Science's Perspective Comparison : BLAST vs. Clustal Omega ?!
2
2
Entering edit mode
9.7 years ago
Bara'a ▴ 270

Hello :)

I'm a computer information system's student working on a thesis that tackles gene functionality tracking through evolution in some plant species, and I have done two separate experiments so far using two well known bioinformatics applications.

BLAST and Clustal Omega were used to measure the similarity percentage between genes across species to be further investigated for changing functionality.

Unfortunately, my supervisor wasn't really satisfied by the results despite of the hard work I've made in those experiments :\

Today, he asked me to compare the results I have i.e. (compare BLAST results vs. Clustal Omega results) from a computer science perspective using some algorithms! :\

Honestly speaking, he didn't clarify much what he wants me to do, neither do I know how exactly to do this!

I'm terribly confused and don't know what to do !! :S

Is it even possible to compare Local Alignment Results against Global Multiple Alignment ones?

If so, how can this be done using Computer Science Algorithms?

I'm open to discuss the possibilities of this vague situation.

Any idea what does "computer science perspective" means here?

clustal-omega blast • 10k views
ADD COMMENT
0
Entering edit mode

What do you mean exactly when you wrote that you used BLAST and Clustal Omega to measure the similarity percentage between genes across species? You had a set of ortholog genes from multiple genomes and did ortholog-vs-ortholog between all pairs? If you did that by hand, it was probably a lot of work. But if you're a CS student, then surely you automated everything and did very little by hand, so where's all the hard work? If you describe your results files in more detail, maybe people here can suggest something..

ADD REPLY
1
Entering edit mode

@5heikki... Yes , I measured the similarity between a set of orthologous genes from different species .

Although I didn't do it by hand , but it was a lot of work to do as am a programmer NOT a bioinformatician since I had to grasp the field's basic concepts first and read more articles for nearly a year and a half before starting the actual work !!

Why it's a hard work? because the main purpose of my thesis requires a very long pipeline procedure of file's preprocessing, actual processing, filtering, comparing, analyzing and combining.

Basically it was in the following order :

  1. Searched for the best bioinformatics application that extracts SSR's from species (I used SciRoKo)
  2. Extracted flanking regions based on SSR's (resulted in two files for each species: Left and Right flanking region)
  3. Calculated distance matrices between each pair of species in the data set using Clustal Omega (I performed four separate searches in this step: first between left and left for all pairs of species , second between left and right reverse complement for all pairs of species , third between right and right for all pairs of species and fourth between right and left reverse complement for all pairs of species), some files in each set took 30 hours on average to get done!!
  4. Filtered the distance matrices for similarity threshold on Right >= 90 and difference threshold on Left >= 50 (in percentage format provided with clustal's omega flag --percent-id)
  5. Combined the filtered files in the following manner for comparison : for each pair of species two sets were constructed; (Left Vs. Right Reverse Complement) and (Right Vs. Left Reverse Complement).
  6. Found exact matches in each set and store them along with their corresponding information (Query ID, Subject ID, Left Similarity, Right Similarity, Query SSR, Subject SSR, Query Sequence, Subject Sequence)
  7. Tracked highly similar genes in different locations within and across species for changing functionality (using Interproscan)
  8. Report the evolution path of those genes as a final step.

Isn't that a hard work enough for an absolute novice in bioinformatics, NOT to mention the BLAST experiment in details?!

Anyways , thank you for your comment :)

It would be highly appreciated if you could answer my question after providing the full details of my thesis.

ADD REPLY
1
Entering edit mode
7.6 years ago
Bioaln ▴ 360

Hello there. I realize this is a relatively late answer, as this question is quite old now, but still, maybe some pointers.

The comparison is definitely possible, yet this can be tricky due to the fact that not many times such cases appear in real life applications. Normally one chooses the method according to the task, not the vice-versa. The local alignment methods provide the means, to compare PARTS of sequences in an efficient manner, whereas global alignment is concerned with whole inputs.

From algorithmic point of view, there are many differences, which is probably what your mentor expected from you. Try to compare the following aspects: 1.) Computational time complexity 2.) Computational space complexity 3.) Known biases due to e.g. sequence order or similar occurrences. 4.) You can always prove your point by e.g. taking sequences from very distant species and perform a global and a local alignment, in order to demonstrate the redundancy of the first approach in such case (local will outperform global, as there are similar REGIONS, yet overall the sequences might differ!)

Hope this helps.

ADD COMMENT
0
Entering edit mode

Although I have done this work , but still interested in the answer !!

Thanks a lot for the reply

ADD REPLY
0
Entering edit mode
9.7 years ago
Bara'a ▴ 270

I still need the answer, please!

Is it logically possible to compare Local Alignment Results against Global Multiple Alignment ones?!

ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6