I obtained sequenced reads (using Illumina) from a ChIP-seq run that seems to have gone bad according to the FASTQC output. The results of the FASTQC output are shared here http://dl.dropbox.com/u/17931758/Run18_s_4_sequence_fastqc/fastqc_report.html Does anyone have any ideas on what could have gone wrong? I asked them to prepare the libraries again and resequence but the results are similar.
What is it that particularly concerns you and why? There doesn't seem to be a technical problem with the sequencing, your quality scores look good across the read. You do understand that ChIP-Seq is going to enrich for certain sequences in the result set don't you? Hence you will expect to get a certain level of duplication and that will potentially affect overall base composition, Kmer composition and overall GC content.
I would expect the GC-content and kmer composition to get affected. What I wasn't expecting is that the GC content would vary so strongly from base to base along the read. I would expect G-C content along the read to be relatively constant even if different from the content of the underlying genome. Also, on aligning I found only a small fraction of the reads aligning to the target genome (~1%).
That information would have been useful in the question. Do your non-aligning reads map to anything else?
Well, the alignment was underway when I put the question but yes it is useful information. I will be checking next if it maps to anything else.