Question

Protein Secondary Structure Prediction

0

Entering edit mode

6.1 years ago

leila.khalatbari • 0

Hi everyone. Here are my questions. I have come up with a machine learning-based method for prediction of protein secondary structure. I evaluated my method using the publicly available dataset, RS126. However, as it is a little old, I decided to evaluate my method on a few more recent datasets as well. I read the recent articles and noticed that most of them have empoyed the CASP13, CASP12 and CASP11 datasets. I downloaded them from the "predictioncenter.org". There are many files included. But what I understand and need is the sequence file (the amino acid chains). What I don't understand is that there is not a secondary class lable for the residues of the corresponding sequences. Can anyone explain why? And does anyone know any other popular, publicly available and recent datasets for evaluation of protein secondary structure prediction? Thanks heaps.

sequence • 1.4k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.1 years ago by leila.khalatbari • 0

0

Entering edit mode

There are a lot of secondary structure prediction tools, but this is prediction.

Like this one: http://download.igb.uci.edu/Bioinformatics-2014-Magnan.pdf

Why wouldn't you like to use proteins from pdb structural databank?

It enumerates secondary structure elements in each determined structure.

And HELIX is alpha-helix... But there are a lot of points of view here.

For example, GitHub gives:

Standardized data set for machine learning of protein structure

https://github.com/aqlaboratory/proteinnet

It may be more useful to you.

Conserning your question about nucleotides instead of proteins...

If you know the genetic code for some particular species, you can easily

transform nucliotide sequence into protein one, but not back because of redundancy,

right? And the genetic code itself may slightly vary between organisms.

Nucleotides look more reliable.

ADD REPLY • link 6.0 years ago by natasha.sernova ★ 4.0k