In the fastQC report of my ChiPseq (paired-end, H3k27me3, Sanger / Illumina 1.9, Sequence length: 42), I got failure in Per base sequence content (see figure below), but other fields such as Adapter Content were fine. What does this indicate and any suggestion for pre-processing the data? Many thanks!
I have ChiPseq reads on
Thanks for the comment. Yes at the beginning of the sequence the variation is explained in the blog post. Here I am more concerning about the middle part of read, where the G C and A T percentage difference is around 10%, which leads to the "failure" in the fastQC report.
failures
on FastQC reports are not immediate indicative of bad data. Yes there is the discrepancy you make a note of, but perhaps it is because you are enriching AT-rich sequences in your ChIPseq. You will not know that until you analyze the data. If data does not make sense after analysis you can retrace the steps back to see where things may have gone wrong.Thanks for comment