I am working with ChIP-seq paired-end data where there is concern that one or more of the biological replicates may not be very good, but it is unknown which replicate may have a problem (I suspect there has to be at least one poor replicate in the data). The first part of my question is very simple: what do you recommend that I do to find this replicate to either toss it or fix it with some quality trimming on the ends?
For the moment, I tried using bowtie2 to trim 10 bp from the 5' and 3' ends of the reads in each of my samples just to see whether this fixes my problem. To define what I mean by "problem": basically, my final results (gene list) does not come out as I would expect it to come out (there are no genes of a certain type that I am looking for based on my biological intuition for what I should be seeing). When I run bowtie2 with the trimming options set, I do indeed get my .sam files okay, but my error file tells me:
(ERR): bowtie2-align died with signal 2 (INT)
20172305 reads; of these:
20172305 (100.00%) were paired; of these:
3064536 (15.19%) aligned concordantly 0 times
13699211 (67.91%) aligned concordantly exactly 1 time
3408558 (16.90%) aligned concordantly >1 times
----
3064536 pairs aligned concordantly 0 times; of these:
807633 (26.35%) aligned discordantly 1 time
----
2256903 pairs aligned 0 times concordantly or discordantly; of these:
4513806 mates make up the pairs; of these:
3111731 (68.94%) aligned 0 times
617853 (13.69%) aligned exactly 1 time
784222 (17.37%) aligned >1 times
92.29% overall alignment rate
I am not sure whether I can be happy with this alignment for this ChIPseq experiment (even though I have heard that ChIPseq experiments sometimes give at best 50-60% alignment). My overall alignment rate is always greater than 90% in each sample so maybe I have nothing to fear (or maybe this is too much?). Perhaps I shouldn't be trimming reads in the first place? So this is where my other concerns come in: is this alignment "good enough" to proceed further in my analysis (ultimately culminating in a new gene list)? Or should I run bowtie2 with other options like --qc-filter
(filter out reads that are bad according to QSEQ filter)? Perhaps there is something else I can try besides for experimenting with bowtie2 parameters?
EDIT: I examined my error file for one of my samples for the cases of 1) no trim at all 2) trim 10 bp 3) trim 20 bp. In each case, the error file is literally identical to the one above, the only little difference being a very slight increase in the "aligned >1 times" category (which makes biological sense since the reads get shorter and thus less specific in terms of mapping to different places on the reference genome). Also, the resultant sam file becomes smaller the more base pairs you trim from both sides.