R functions that extract the ORF from a sequence
2
0
Entering edit mode
9.3 years ago
peter.durr • 0

Hi everyone

I am working within R and need to extract the open reading frame (ORF) from a number of viral sequences

Somewhat to my surprise I have not yet been able to come across R functions within a package that find the ORF and readily extracts them.

Can anyone point me to R functions that will do these tasks?

Thanks

sequence • 9.5k views
ADD COMMENT
0
Entering edit mode

Why in R? There are many other possible and straightforward solutions available (bedtools, EMBOSS, etc)

ADD REPLY
0
Entering edit mode

yes you are certinly correct - for sequence manipulation there are better tools

I am trying to do things in R because

  1. of the downstream tools - especially for phylogenetics
  2. I can code the total workflow into one replicable file

but.... in this case maybe R is not yet mature enough, and I will need to do the sequence manipulations outside of R and then work on a clean alignment for the analysis

ADD REPLY
0
Entering edit mode

Are you attempting de novo prediction of all ORFs, or do you want to extract only the ORFs from known/annotated viruses?

ADD REPLY
0
Entering edit mode

I am extracting from known viruses - actually segments of influenza viruses

the challenge arises is when I download a lot of them from Genbank, the segments will be of variable lengths

the five starting scenarios are:

  1. complete segment length (about 1741 nt for segment 4)
  2. complete coding sequence (about 1704 nt for segment 4)
  3. missing regions to the left - with no start codon
  4. missing regions to the right - with no stop codon
  5. missing left and right - no start and stop codon

I am hoping to develop a workflow that can classify the sequences in to the 5 groups - and I was hoping I could build on existing code

thanks

ADD REPLY
0
Entering edit mode

This might be a good starting point:

http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter7.html

The SequinR package has a number of functions that deal with the prediction of reading frames:

https://cran.r-project.org/web/packages/seqinr/seqinr.pdf

ADD REPLY
4
Entering edit mode
5.4 years ago
hauken_heyken ▴ 130

The R package ORFik in Bioconductor has all you need, implemented in C++ and even takes circular genomes.

ADD COMMENT

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6