Question

Minimum length of reads

1

Entering edit mode

8.3 years ago

deepti1rao ▴ 60

I have used a trimmer to trim my reads, based on their qualities. My raw reads were 126- 150 bp. I did not set any minimum length to filter out short reads. Hence, I have reads as small as 0-4 bps in my processed file. I need to do a reference based assembly using these paired end reads. I am afraid of losing paired information by filtering out the shorter reads. What should my minimum length be?? I have a 40 x coverage with the raw reads.

Ngs Reads Processing Trimming Quality • 7.8k views

ADD COMMENT • link updated 2.3 years ago by AISHA ▴ 140 • written 8.3 years ago by deepti1rao ▴ 60

0

Entering edit mode

Run FASTQC on your files, and see where the quality drops off in terms of sequence length. I'm assuming you have 2x150 bp reads, so your quality will start to drop off around 125 bp. You will lose some pairs in the process of filtering out both quality, and length. Ideally, keep the pairs that pass both of the QC processes, and then align to your reference.

ADD REPLY • link 8.3 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Thanks for your reply! I used the fastq quality trimmer from the fastX - toolkit. I trimmed just one file, which had my forward reads, using a quality threshold of 30. And yes, my reads are Illumina- 150 bp in length. What do you think is a good length to set as a filter?

ADD REPLY • link 8.3 years ago by deepti1rao ▴ 60

0

Entering edit mode

It sounds like you may have trimmed to severely. What quality threhold did you use, with which program, and what are you planning to use the data for?

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for your reply Brian! I used the fastq quality trimmer that came with the fastX - toolkit and trimmed with a quality threshold of 30. I want to use these reads for a reference based genome assembly.

ADD REPLY • link 8.3 years ago by deepti1rao ▴ 60

0

Entering edit mode

30 is too high for almost all purposes. I would suggest trying a much lower threshold like Q20 or Q15 and see how it impacts assembly. And I do not recommend fastX as it is (as you have found) quite slow and cannot handle paired reads.

ADD REPLY • link 8.3 years ago by Brian Bushnell 20k

score 0 · Answer 1 · 2017-08-08

0

Entering edit mode

8.3 years ago

GenoMax 154k

I am afraid of losing paired information by filtering out the shorter reads.

Having very short reads may make the assembly difficult/unusable. You could use reformat.sh from BBMap suite to filter your trimmed paired end reads while keeping them in sync. reformat.sh in1=read1.fq in2=read2.fq out1=filt1.fq out2=filt.fq ml=30 (change ml= as needed).

ADD COMMENT • link 8.3 years ago by GenoMax 154k

0

Entering edit mode

Thanks for your suggestion! From your post, I can infer that reformat.sh would retain only the reads that come in pairs, with a minimum length of 30 (if ml=30). Is this right??

Does bbmap have a quality trimming tools as well?

ADD REPLY • link 8.3 years ago by deepti1rao ▴ 60

2

Entering edit mode

Yes for the first question. Yes it does for second. bbduk.sh is the scan/trim program in BBMap suite. If you are doing reference based assembly they you could afford to trim at a lower Q threshold. Q20 or better may fine.

ADD REPLY • link 8.3 years ago by GenoMax 154k

0

Entering edit mode

Can you compare the speeds of bbduk.sh and fastX quality trimmer? I found fastX to be pretty slow. Perhaps, I'll try both Q 20 and 30, separately and see how good my depth and breadth are.

ADD REPLY • link 8.3 years ago by deepti1rao ▴ 60

2

Entering edit mode

bbduk.sh is thread enabled and can use more than 1 core if you have them available. It will be significantly faster than fastX.

ADD REPLY • link 8.3 years ago by GenoMax 154k