PWM scores oscillating at highly regular intervals after ATG start codon
0
0
Entering edit mode
8.1 years ago
ahtmatrix • 0

I have an R script that takes in a fasta containing all coding sequences in human RNA. I then extracted 9 bases before the start codon of each CDS and the 1 base after the start codon. Here is an example sequence: "ATCGTAGCT ATG A".

I then used Biostrings in R to construct a PWM as shown below.

#make a pwm of length 13
human = readDNAStringSet('extracted_rna.fasta)
human.kozak = DNAStringSet(human)
human.pwm    = PWM(human.kozak, type = 'log2probratio')

I then used the human mRNA PWM to score the first 99 positions of a fasta of viral coding sequences below.

subject.fasta = readDNAStringSet('viral.fasta')

pwm.score.dataframe = NULL

#loop through length of subject DNAStringSet which should be num of sequences in fasta
for (i in 1:length(subject.fasta)) {

  #using the human pwm score using a sliding window of 13 as defined by the PWM until position 99 in subject seq
  scores <- PWMscoreStartingAt(human.pwm, subject.fasta[[i]], starting.at = 1:99)

  pwm.score.dataframe = cbind(pwm.score.dataframe, scores)
}

write.table(pwm.score.dataframe, file = "humanPWM_viral.csv")

When plotted in Microsoft Excel, I averaged the score of each position and graphed the position vs score below. Regardless of the subset of viral sequences I used, after the ATG spike the PWM scores oscillate with a wavelength of 3 positions. What is causing this behavior? You'd expect that after the the ATG start codon there would be enough random noise to cancel out fluctuations in the PWM score.

enter image description here

R biostrings biopython pwm position weight matrix • 2.1k views
ADD COMMENT
0
Entering edit mode

The result looks really strange to me. What were your reasons to do this analysis and what were you expecting to see?

ADD REPLY
0
Entering edit mode

You say:

I then used the human mRNA PWM

But in your code I see:

#using the human pwm score using a sliding window of 13 as defined by the PWM until position 99 in subject seq
  scores <- PWMscoreStartingAt(virus.pwm, subject.fasta[[i]], starting.at = 1:99)

It seems instead of using human.pwm you are using virus.pwm. Could that be the source of your problems?

ADD REPLY
0
Entering edit mode

Ah thats a typo I reran it with the corrections and it gives the same result

ADD REPLY

Login before adding your answer.

Traffic: 1688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6