Hi,
I've been using seqinr sucessfully for a while now, but ran into an odd problem today. I'm using read.fasta to read in fasta-format sequences, and for over 40,000 files it has done this repeatedly without problems.
However for 6 files, which are not unusual or different in any way that I can see from the other files, it is not reading them in correctly. Even though the sequences are over 100 bp (or at least over 50) it is returning to me 4-20 basepairs when it reads them in. Perhaps the most puzzling bit is that often what it is giving to me is not even in the original sequences - I have no idea where it is getting these bases from!
Is this is a common seqinr problem? Any steps I can take? It still can read in a random selection of the other files perfectly normally.
I'm running Windows 7 and R 2.11.0
Edit: basically it didn't work even when I simply did
read.fasta("CG13569_CG13569_FBtr0072283.fas")
However... now I'm trying to replicate it and it's reading correctly. Perhaps it was a memory error somehow?
Can you edit your post to add an example of the code you're using? Without that info, it's hard to figure out exactly what's going on.
i agree with Chris, and you could also paste a couple of the sequences (and headers) that are giving you problems.
Hi Emma. Maybe a random shot, but has the problematic files been prepared differently? Maybe under a Mac or a different system than what you use? Different systems (Linux, Mac or Windows) use different 'end of line' characters and many problems I solve for the people around me are caused by saving the file in a more or less compatible way under Mac. If ever this can help ;)
Agreed, need to see a sample sequence. We could download e.g. FBtr0072283 from FlyBase, but that may not reflect what you have on your hard drive.