evaluation distance in Eukaryotic
0
0
Entering edit mode
9.1 years ago
disimos89 • 0

Hello everyone,

I have a project in which I will create an algorithm which will take a nucleotide sequence and it will decide if it is splice site or not. It is a classifier problem. Using the Ngram, we take the similarity of the testing sequence of the positive class and of the negative class, thus, I have two features for the classifier.

I would like to find a new one. How could I calculate the evolution distance between some eukaryotic species? (H.sapiens, rerio, melanogaster, elegan, thaliana). I need this feature in order to take more seriously (higher weight) some species(sequences) than other when I will calculate the centroid of the negative class and positive class.

There is the source domain and the target domain. In my algorithm the source domain we use for the training and the target domain for testing. For instance, we could have H.sapiens and Rerio and Melanogaster for source domain and only thaliana for target domain.

  1. Should I find a conserved gene or protein? in this case, which gene or protein is appropriate?

Reading this paper (they take a gene and do a phylogenetic analysis)

I realized that I have to choose a gene which is in all studying species

First approach (easy way)

After choosing a gene(fasta file) with the help of the Clustal program, we create the Multiple sequence alignment and there is also a choice in order to take the evolution distance as an array.

Second approach (hard way)

Having the Multiple sequence alignment, we take the result and put it into the TREE-PUZZLE program, form which we take some values for the PHYLIP program. Where could I find a tutorial for this approach?

similar problem here

Ngram splice-site sequence classifier alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

Would time tree be interesting to you?

ADD REPLY
0
Entering edit mode

Absolutely, the timetree is a solution to my problem. There is some similar works who already used the timetree in order to solve similar problem. However, I would like to solve this problem by using phylogenetic analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6