Per base sequence content failed in fastQC
1
0
Entering edit mode
3.1 years ago
yliueagle ▴ 290

In the fastQC report of my ChiPseq (paired-end, H3k27me3, Sanger / Illumina 1.9, Sequence length: 42), I got failure in Per base sequence content (see figure below), but other fields such as Adapter Content were fine. What does this indicate and any suggestion for pre-processing the data? Many thanks!

I have ChiPseq reads on

Per base sequence content failed fastQC

fastQC • 4.0k views
ADD COMMENT
2
Entering edit mode
3.1 years ago
GenoMax 147k

Please see this blog post from authors of FastQC for the non-random pattern that you see at beginning of reads. That is likely due to tagmentation method used to make the libraries.

You should probably move on with the rest of your analysis. Aligners should be able to deal with bases that don't align and soft-clip them.

ADD COMMENT
0
Entering edit mode

Thanks for the comment. Yes at the beginning of the sequence the variation is explained in the blog post. Here I am more concerning about the middle part of read, where the G C and A T percentage difference is around 10%, which leads to the "failure" in the fastQC report.

ADD REPLY
0
Entering edit mode

failures on FastQC reports are not immediate indicative of bad data. Yes there is the discrepancy you make a note of, but perhaps it is because you are enriching AT-rich sequences in your ChIPseq. You will not know that until you analyze the data. If data does not make sense after analysis you can retrace the steps back to see where things may have gone wrong.

ADD REPLY
0
Entering edit mode

Thanks for comment

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6