Trimming DNAStringSet
1
0
Entering edit mode
3.0 years ago
a.krassnig ▴ 20

Hello,

I am currently dealing with the problem of reading in a Fastq-File with "readDNAStringset", trimming the Sequences and then writing them in to a new fastq-file.

The reading of the fastq-file with "readDNAStringSet" is working just fine.

I am then trying to trim a fixed length from the left side of the Sequence (ex.10 Bases). Right now I use "subseq(my_Stringset, start =10)".

But when I try to write the fastq-file it seems like I have not trimmed the QualityScores and get the Error message "'x' and 'quality' must have the same width".

I have been looking for a while now, how to trim the sequence aswell as the Q-Score, but I just can't find a Solution (Which should be also fast, since I am working with very big Files).

Is there an correct way to do, what I am trying to do?

Thank you in advance.

I am using R 4.1.0

Trimming R DNAStringSet Biostrings readDNAStringSet • 2.3k views
ADD COMMENT
0
Entering edit mode

It's working fine here (R 4.0, Biostrings 2.58.0). Please post example fastq.

fig1 fig2

ADD REPLY
0
Entering edit mode

Hey, thank you for your answer,

As far as I can see the ReadDNAStringset only saves the qualities if you set the parameter "with.qualities = TRUE". Otherwise you dont save the Quality-Values at all and also dont save them to your fastq-file. Am I correct here?

I think that is why the Qualty is a ";" for all bases. I just tried it and it also does that with my example.

Have a great day :)

ADD REPLY
1
Entering edit mode
3.0 years ago
Michael 55k

I am not sure about the level of support for fastq in Biostrings. Please use the ShortRead package instead and use its built-in trimming functions. Are you sure you want to trim a fixed size off from the reads? This shouldn't be required if the reads were processed properly. I suspect this could be motivated by a misinterpretation of fastQC reports. Normally, one would only trim in case there is adapter content or low quality in a window, and otherwise leave the reads alone.

ADD COMMENT
0
Entering edit mode

Dear Michael,

Thank you for your answer. It actually is not the same trim length for every Sequence. I just thought this would be mor easy for my question. Actually I go through the Sequences an Trim depending on a calculation and allignment to another Sequence.

I have found ShortRead but was not sure if I should use it instead of Biostring. Is it possible to read the Sequence with Biostring and manipulate it with ShortRead?

ADD REPLY
0
Entering edit mode

I haven't tried this yet, and I don't think you can convert the objects between the packages but should be easy to test.

ADD REPLY

Login before adding your answer.

Traffic: 2145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6