Question

Filtering fastq (by quality score and length), optimum criteria?

1

Entering edit mode

9.1 years ago

dina.hesham139 ▴ 170

For filtering fastq files (of RNAseq data) (by quality score and length) in galaxy, what are the optimum criteria?

i.e. the min and max size, the min and max quality and Maximum number of bases allowed outside of quality range.

My datasets are from human samples, Hiseq2000, paired end experiment (2 separate files per sample).

RNA-Seq galaxy • 4.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.1 years ago by dina.hesham139 ▴ 170

Ram · Answer 1 · 2015-10-25

2

Entering edit mode

9.1 years ago

Sean Davis 27k

Assuming that you are using some alignment process (and not de novo assembly), we generally do not filter and trim tails only very lightly. The alignment process itself is a great filter.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by Sean Davis 27k

0

Entering edit mode

Thank you alot

ADD REPLY • link 9.1 years ago by dina.hesham139 ▴ 170

score 0 · Answer 2 · 2015-10-28

0

Entering edit mode

9.1 years ago

Ian 6.1k

As a core facility we generally run our sequences through Trimmomatic to remove adaptor sequence (most important to avoid mapping errors) and trim reads when a moving 4nuc window has a mean quality score below 20.

ADD COMMENT • link 9.1 years ago by Ian 6.1k