Estimated protein size as classification parameter

0

Entering edit mode

8.1 years ago

Macherki M E ▴ 120

I used coding DNA sequence (CDs) to predict the average of the protein size of E.coli k12 from a sample provided by the seqinr R package ,data ec999. The size of each sequence is stored in variable x. I used two histograms to fit the distribution of this variable and their 10 base logarithm.

I suggest that the log linear distribution is more reasonable.The average E can be calculated then:

E1=mean(x)/3

E2=exp(mean(log(x)))/3

I have to classify CDS data from different genome, shall I use E2 and theirs standard deviation as parameters?

R genome • 1.5k views

ADD COMMENT • link 8.1 years ago by Macherki M E ▴ 120

1

Entering edit mode

I suggest that the log linear distribution is more reasonable.

reasonable for what?

Classification with respect to what kind of annotation or classes? If the CDS is correct then then length(AA) == length(CDS)/3. You can also simply use the know protein lengths from the genome annotation.

ADD REPLY • link 8.1 years ago by Michael 55k

Login before adding your answer.