Question

Fetching Genbank records and parsing them to be DNAStringset objects

0

Entering edit mode

6.5 years ago

peter.durr • 0

Hi everyone

I am trying to develop a workflow in R that lets me fetch Genbank sequences and then undertakes multiple sequence alignments.

Where I am at, is that I can now use efetch{reutils} to successfully fetch a sequence as a character, "efetch" object.

RABV.fa <- efetch("EU293116", db="nucleotide", retmode="text", rettype="fasta")

the next step is to convert this into a DNAStringSet from the Biostrings library

RABV2.fa <- readDNAStringSet(RABV.fa)

However, I get the error:

> Error in .normarg_input_filepath(filepath) :    'filepath' must be a character vector with no NAs

I have played around with the options for readDNAStringSet - but without success.

Thanks

PS. This question revisits one asked in BioStars a few years ago - R: Parsing Fasta From Strings In R? - but the solution offered by Michael Dondrup no longer works (for me at least. An alternative was provided by Neilfws, but this produces a SeqFastadna file.

R Genbank XStringSet Biostrings • 2.0k views

ADD COMMENT • link updated 6.5 years ago by Gungor Budak ▴ 270 • written 6.5 years ago by peter.durr • 0

score 1 · Accepted Answer · 2018-05-27

It seems readDNAStringSet requires a path to the FASTA file rather than directly working with the object returned by efetch. A workaround I can think of is to write the content of the object to a file then read it using readDNAStringSet as follows;

RABV.fa <- efetch("EU293116", db="nucleotide", retmode="text", rettype="fasta")
# here write the content to a file
write(RABV.fa$content, 'rabv.fa')
RABV2.fa <- readDNAStringSet('rabv.fa')

Or a temporary file can be used as also suggested in the linked post (note that destfile argument is now outfile);

tmp = tempfile()
RABV.fa <- efetch("EU293116", db="nucleotide", retmode="text", rettype="fasta", outfile=tmp)
RABV2.fa <- readDNAStringSet(tmp)