Hey,Ive been working on extracting Sequences from a fasta file for some days now and finally decided to post my problem here. I have a fasta file containing approximately 15000 Sequences and I'd like to split these Sequences based on a certain pattern. I'm using the Biostrings package in R with the following functions:
library(Biostrings)
fasta <- readDNAStringSet("~/Downloads/NGS/IgsubPrimer/fasta", "fasta")
matched <- vmatchPattern("GACCACGTTCCCATCT", fasta, max.mismatch=1)
In the docus I find that vmatchPattern
gives back an MIndex file on which basis it is possible to extract all matching sequences and write them to a new fasta file. Ive tried extractAllMatches(fasta, matched)
which seems just to work with XString files and not in this case. Ive also tried the following line from some documentary:
x <- lapply(seq(along=fasta), function(x) as.character(Views(fasta[[x]], start(matched[[x]]), end(matched[[x]]))))
which gives:
Error in fromXStringViewsToStringSet(x, out.of.limits = out.of.limits, :
'x' has "out of limits" views
For a hint I would be very grateful, I'm really stock atm... :(
Thanks alot and apologies for my not reproducible problem! I can reproduce your suggestions with no problem at all and get my final fasta of matched sequences. But If I apply this to my data, where fasta is the subject and IgA is the Mindex file returned from vmatchpattern:
So I'm still getting the error message and I cant explain why. Any idea?
I'm not sure. Try running
traceback()
after you get the error. And can you runViews(fasta[[1]], IgA[[1]])
without errors? If so, droplapply
and run a loop with a print statement and see where it goes out of limitsViews(fasta[[1]], IgA[[1]])
is working fine. When I run traceback it returns:Returning with Debug gives me:
I'm not really sure how to interpret that, cause I'm just getting started with R and this Biostrings package. I'm also a bit lost with your recommendation to run a loop with print statements. Please excuse my ignorance!
I don't know either - the experts on the BioC mialing list may be able to help more, but you should post a reproducible example there. You could also try using toString instead of as.character. Also, to change lapply to a loop, try this to catch where the error occurs.
Thank you so much for your advice! With toString instead of as.character I get the same error message. And if I'm looping it like you suggested I get also the same error when getting to 1574.
When I take a look at that specific Sequence with Views() it looks exactly like all the others before. So maybe the function cannot handle so much data!? To get the matched Sequence anyway I did a little work around. Ill post it as an answer below...
Can you attach the output from
fasta[[1574]]
andIgA[[1574]]
?When I compare it to the sequences before, it doesn't look different..
With a start at 0, you'll get a space in the first position and then an error converting the View to a string. You could probably check for that in the loop or better yet track down why vmatchPattern returns a start at 0.