Question

Per base sequence content failed in fastQC

0

Entering edit mode

3.1 years ago

yliueagle ▴ 290

In the fastQC report of my ChiPseq (paired-end, H3k27me3, Sanger / Illumina 1.9, Sequence length: 42), I got failure in Per base sequence content (see figure below), but other fields such as Adapter Content were fine. What does this indicate and any suggestion for pre-processing the data? Many thanks!

I have ChiPseq reads on

Per base sequence content failed fastQC

fastQC • 4.0k views

ADD COMMENT • link updated 3.1 years ago by GenoMax 147k • written 3.1 years ago by yliueagle ▴ 290

score 2 · Answer 1 · 2021-10-25

2

Entering edit mode

3.1 years ago

GenoMax 147k

Please see this blog post from authors of FastQC for the non-random pattern that you see at beginning of reads. That is likely due to tagmentation method used to make the libraries.

You should probably move on with the rest of your analysis. Aligners should be able to deal with bases that don't align and soft-clip them.

ADD COMMENT • link 3.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks for the comment. Yes at the beginning of the sequence the variation is explained in the blog post. Here I am more concerning about the middle part of read, where the G C and A T percentage difference is around 10%, which leads to the "failure" in the fastQC report.

ADD REPLY • link 3.1 years ago by yliueagle ▴ 290

0

Entering edit mode

failures on FastQC reports are not immediate indicative of bad data. Yes there is the discrepancy you make a note of, but perhaps it is because you are enriching AT-rich sequences in your ChIPseq. You will not know that until you analyze the data. If data does not make sense after analysis you can retrace the steps back to see where things may have gone wrong.