Question

error reading fastq-files with readDNAStringset

0

Entering edit mode

2.9 years ago

a.krassnig ▴ 20

I am trying to read a fastq-file with readDNAStringSet and having quite some trouble doing so.

I need the names, aswell as the quality-scores. Right now I am using:

 readDNAStringSet(myFastqFile, format="fastq", use.names= TRUE, with.qualities = TRUE)

But here i get the Error:

 "@" expected at beginning of line 1

I have troubleshooted this and figured out, that the error does not appear, when I only use nrec = 3014734 (while I actually have 3835928 Sequences).

I have looked at those Sequences in the fastq-file from where it does not work closely but just cannot find anything wrong with the file.

Also what was quite weird: If I set seek.first.rec = TRUE I also get an error when I read in more then the Sequences mentioned above, but a different one:

no FASTQ record found

I have loaded the fastq-file on WeTransfer, perhaps someone as any idea, what's wrong with it.

https://wetransfer.com/downloads/b652fcf9b9f6b4b6a07dac79b31b2c6820211221092436/4311e50dd963fc93538979f107182d2420211221092458/6e526c

I really hope someone has an idea, since I am out of knowledge:)

DNAStringSet Biostrings readDNAStringSet R • 1.7k views

ADD COMMENT • link 2.9 years ago by a.krassnig ▴ 20

1

Entering edit mode

I a bet that your file is truncated during download. Try to download again using a more safe method, e.g. wget or curl. Ask the sender to use a more reliable transfer method including a compressed archive and generate a checksum or a different file sharing provider. Why are you using wetransfer? We cannot access this download anyway. After you download, check that the file is correct by using the checksum. Use another tool like fastqc to check the file.

Another problem could be that the number of records the package can parse is limited by an integer variable. Please try the shortread package instead, as recommended previously.

ADD REPLY • link 2.9 years ago by Michael 55k

0

Entering edit mode

Thank You for the answer,

I am actually creating these files myself, so I did not download them. (Well I did with wget and then merged them).

I wasn't sure how to share them, so I choose WeTranfer.

Thank You for alle the recomendations I will for sure use them, when looking at this problem.

ADD REPLY • link 2.9 years ago by a.krassnig ▴ 20

0

Entering edit mode

Oh I see. So how did you merge them and why? Did you merge unrelated files or create an interleaved fastq file from paired end sequences? I am guessing, the file somehow got corrupted during the process.

ADD REPLY • link 2.9 years ago by Michael 55k

0

Entering edit mode

I merged them with "PEAR". The files were related (forward and reverse of the same sample).

Since I came to the conclusion, that the corruption happens while reading the file with ReadDNAStringset I came up with a workaround: Reading the Lines of the file and saving the sequences and qualitys seperatelly into the StringSet Object. This works just fine and does not give me an error, even when I am reading all Sequences.

Thank you very much for your help!

ADD REPLY • link 2.9 years ago by a.krassnig ▴ 20