Question

FASTQC Per base sequence content failed WES

0

Entering edit mode

2.0 years ago

tanbiswas6 ▴ 10

Hi

I am doing WES data analysis and it failed at per base sequence content. I has some sequence duplication also. Below is a snapshot of my data.

enter image description here

Please let me know how to process this file.

Thank you.

DNA-seq QC WES FASTQC • 2.3k views

ADD COMMENT • link updated 2.0 years ago by GenoMax 148k • written 2.0 years ago by tanbiswas6 ▴ 10

0

Entering edit mode

You have no adenin at your 4th read across all sequences. In general the first 7 to 8 reads are bad. If thats an option for you, just omit them or ignore them

ADD REPLY • link 2.0 years ago by lennykovac ▴ 110

0

Entering edit mode

Thanks for the suggestion. Can you please suggest how to remove the first 7-8 reads without disturbing any other reads in the file?

Thanks.

ADD REPLY • link 2.0 years ago by tanbiswas6 ▴ 10

0

Entering edit mode

There is likely no need to do any processing at this point. If there is a problem located with the data in downstream analysis then you can come back and dig into this more. FastQC limits are designed for plain genomic sequencing. Depending on kind of experiment there may be "failures" on one or more tests. This does not automatically mean that the data has a problem or is bad.

While it is a bit odd to have majority T's at cycle 4 the data may still be fine.

ADD REPLY • link 2.0 years ago by GenoMax 148k

0

Entering edit mode

Yes. That's where my concern is. I know that other reads are fine but if I use this file without removing those reads will not e there some problem while data analysis or publishing?

ADD REPLY • link 2.0 years ago by tanbiswas6 ▴ 10

1

Entering edit mode

Most likely not, but if you want to be absolutely safe you can trim away the first 7 bases of all reads, tools like seqtk can do that.

ADD REPLY • link 2.0 years ago by ATpoint 86k

0

Entering edit mode

I know that other reads are fine

How do you know that. Since FastQC sub-samples your data (it does not look at every read in your file) you at least have enough reads with that pattern in sample it takes.

You can use bbduk.sh from BBMap suite to trim the first 7-8 bases like so

reformat.sh -Xmx2g in=your.fastq.gz out=trimmed.fastq.gz forcetrimleft=7

ADD REPLY • link 2.0 years ago by GenoMax 148k

score 1 · Answer 1 · 2022-12-20

As it is a WES, you are less likely to get deleterosity/pathogenic variants at this, assuming that your sequences are tagged with adpaters ( not trimmed). You can use fastp to trim your adapters automatically else, please find the chemistry from yoru service provider to trim the reads and go ahead. - Prash