Question

trimming or masking ?

0

Entering edit mode

9.8 years ago

Fadel ▴ 20

Hi all,

I'm trying to do quality control for raw data for a genomics class, here is the fastqc report for Read1!

My questions are:

In this case should I do trimming to masking?

if trimming, should I trim the position 14-15 and 54-55 ? I could understand the logic behind trimming the 3' end but how I can trim in the middle of the read? I'm sorry if my question quite basic but I'm really suffering in understanding this concept and if u guys kindly provide me with any reference to read I would highly appreciate your kindness.

P.S I'm undergrad bioinformatics student, who is taking his first steps in NGS data analysis

fastqc • 2.9k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.8 years ago by Fadel ▴ 20

Ram · Accepted Answer · 2015-10-19

3

Entering edit mode

9.8 years ago

Carlo Yague 9.0k

No you shouldn't trim the middle of the reads. You could perhaps mask the problematic positions (I don't know much about masking) but it might be better to remove full reads with base calls falling below a certain treshold. Even if you have a lot of low quality base calls at the 14-15 and 54-55 positions, there are also many good calls (median quality is always in green).

Don't forget to remove adapter reads as well :

Overrepresented sequences: TruSeq Adapter, Index 27 (97% over 44bp)

For resource, have a look at this tool: trimmomatic.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Carlo Yague 9.0k

0

Entering edit mode

thanks carlo, today I spent my day with trimmomatic :D I used before fastx tools but I found the results after trimming the adaptors by trimmomatic are much better .. but I still get overrepresented sequences in Read2, I couldn't know how to remove them and I don't know If I should remove them or not ? I really appreciate ur help!

here are the fastqc results before and after using trimmomatic.

and here is the command line I used to get the result:

trimmomatic PE -threads 8 Read1.fastq.gz Read2.fastq.gz \
                          R1Paired.fastq.gz R1UnPaired.fastq.gz \
                          R2Paired.fastq.gz R2UnPaired.fastq.gz \
                          ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
                          LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50

Thanks :)

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Fadel ▴ 20

1

Entering edit mode

Sorry for the late answer.

I suggest you try blasting your overrepresented sequences against nucleotide collection. Then you'll know where it comes from and what to do with it !

ADD REPLY • link 9.8 years ago by Carlo Yague 9.0k