Hi, I have some problems with the low alignment with 150bp pair-end ChIP-seq data.
1.I trimmed the raw fastq files using trimmomatic-0.38 with parameters
java -jar $trimmomatic \
PE \
-threads $cpu \
-phred33 \
$fq1 \
$fq2 \
${SampleName}.paired.fq1.gz \
${SampleName}.unpaired.fq1.gz \
${SampleName}.paired.fq2.gz \
${SampleName}.unpaired.fq2.gz \
ILLUMINACLIP:${AdapterFa}:2:30:10 \
LEADING:3 \
TRAILING:3 \
SLIDINGWINDOW:4:15\
MINLEN:36
and the quality of trimmed fastq file seems good with 94.448832% bases passed the q30.
2.I aligned trimmed ChIP-seq data (150bp pair-end) to mouse genome mm10 using bowtie2 with parameters
bowtie2 \
-x ${bowtie2_index} \
-1 ${input_fastq1} \
-2 ${input_fastq2} \
-p ${thread}
3.it turned out that the mapping rate was as low as 7.44%. Below is the log:
28860689 reads; of these:
28860689 (100.00%) were paired; of these:
27047041 (93.72%) aligned concordantly 0 times
1386827 (4.81%) aligned concordantly exactly 1 time
426821 (1.48%) aligned concordantly >1 times
----
27047041 pairs aligned concordantly 0 times; of these:
261851 (0.97%) aligned discordantly 1 time
----
26785190 pairs aligned 0 times concordantly or discordantly; of these:
53570380 mates make up the pairs; of these:
53426312 (99.73%) aligned 0 times
31984 (0.06%) aligned exactly 1 time
112084 (0.21%) aligned >1 times
7.44% overall alignment rate
4.then t tried to set parameters -X to 100 or 700, it turned out the similiar alignment rate,6.12%, 7.44% respectively.
5.I tried to only input read1and read2 to bowtie2 separately, however, it turned out the similiar results, the mapping rate was 7.47%,7.46% respectively.
I applied BWA to the pair-end fastq files with parameters
bwa mem\ -t $threads\ $bwa_index\ $fq1\ $fq2
the alignmnet rate increased to ~18%.
to see whether there was something related to contamination, I aligned the trimmed read1 fastq to human genome hg19, and the alihnment rate is 1.38%. So it was unlikely there was contamination with human DNA.
28860689 reads; of these: 28860689 (100.00%) were unpaired; of these: 28463066 (98.62%) aligned 0 times 339985 (1.18%) aligned exactly 1 time 57638 (0.20%) aligned >1 times 1.38% overall alignment rate.
- to get unmapped reads and do blast, I extracted the unmapped reads saved in bam. the following was the commonds.
bowtie2 \ -x ${bowtie2_index} \ -1 ${input_fastq1} \ -2 ${input_fastq2} \ -p ${thread} \ |samtools view \ -h -S \ - \ -bo ${out_dir}/${sample_name}.bam
samtools view -h ${out_dir}/${sample_name}.bam |grep -v chrM|samtools view -h -S - -bo
${out_dir}/${sample_name}.rm.bam
samtools view -F4 -h ${out_dir}/${sample_name}.rm.bam | samtools view -h -S - -bo
${out_dir}/${sample_name}.mapped.bam
samtools view -f4 -h ${out_dir}/${sample_name}.rm.bam | samtools view -h -S - -bo
${out_dir}/${sample_name}.unmapped.bam
9.I only transferred chr10 from bam to fasta using "samtools fasta" commond and submitted 10 reads to NCBI online blastn, it returned that 2 sequences were predicted to Mus musculus BAC library or ncRNA and 2 were predicted to Oryctolagus cuniculus clone.
May the low alignment rate caused by the library construction? It wiil be helpful if anyone would like to give me some advice. Thank you in advance!
What did you ChIP? Is the antibody known for poor IP results? Did you add anything like spike-ins or carrier DNA? Please give some details on how the library was made. Is this standard ChIP-seq or something special? Is the material primary or cell line?
Thank you for your reply. H3K4me1,H3K27ac and transctption factor ChIP-seq were performed, they all have low alignment rate. Yes, we added the yeast spike in DNA. The library were made according to CUT&RUN protocol. here is the paper : enter link description here The libraries were prepared using KAPA Hyper Prep kit.The material was cells isolated in vivo.
I would see what the alignment rate to the yeast genome is. If that doesn't account for the majority of the unaligned reads, the next thing to do would be to check for some kind of non-human DNA contamination, maybe bacterial. Try using the NCBI-blast web tool and blast some of the reads against the "nr" database to see what it hits.
Thank you for your reply. I blast some reads with NCBI-blast, it pointed to the rabbit genome. And I mapped the fastq file to Rabbit genome, it showed ~87% alignment rate. I don't know whether it related to the fact that the host specie of the antibody is Rabbit, by the way, it's monoclonal antibody.