Entering edit mode
9.3 years ago
peter.durr
•
0
Hi everyone
I am working within R and need to extract the open reading frame (ORF) from a number of viral sequences
Somewhat to my surprise I have not yet been able to come across R functions within a package that find the ORF and readily extracts them.
Can anyone point me to R functions that will do these tasks?
Thanks
Why in R? There are many other possible and straightforward solutions available (bedtools, EMBOSS, etc)
yes you are certinly correct - for sequence manipulation there are better tools
I am trying to do things in R because
but.... in this case maybe R is not yet mature enough, and I will need to do the sequence manipulations outside of R and then work on a clean alignment for the analysis
Are you attempting de novo prediction of all ORFs, or do you want to extract only the ORFs from known/annotated viruses?
I am extracting from known viruses - actually segments of influenza viruses
the challenge arises is when I download a lot of them from Genbank, the segments will be of variable lengths
the five starting scenarios are:
I am hoping to develop a workflow that can classify the sequences in to the 5 groups - and I was hoping I could build on existing code
thanks
This might be a good starting point:
http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter7.html
The SequinR package has a number of functions that deal with the prediction of reading frames:
https://cran.r-project.org/web/packages/seqinr/seqinr.pdf