Question

low alignment rate with bowtie2 of ChIP-seq data 150bp pair end

0

Entering edit mode

5.2 years ago

Shuang He • 0

Hi, I have some problems with the low alignment with 150bp pair-end ChIP-seq data.

1.I trimmed the raw fastq files using trimmomatic-0.38 with parameters

java -jar $trimmomatic \
    PE \
    -threads $cpu \
    -phred33 \
    $fq1 \
    $fq2 \
    ${SampleName}.paired.fq1.gz \
    ${SampleName}.unpaired.fq1.gz \
    ${SampleName}.paired.fq2.gz \
    ${SampleName}.unpaired.fq2.gz \
    ILLUMINACLIP:${AdapterFa}:2:30:10 \
    LEADING:3 \
    TRAILING:3 \
    SLIDINGWINDOW:4:15\
    MINLEN:36

and the quality of trimmed fastq file seems good with 94.448832% bases passed the q30.

2.I aligned trimmed ChIP-seq data (150bp pair-end) to mouse genome mm10 using bowtie2 with parameters

bowtie2 \
  -x ${bowtie2_index} \
  -1 ${input_fastq1} \
  -2 ${input_fastq2} \
  -p ${thread}

3.it turned out that the mapping rate was as low as 7.44%. Below is the log:

28860689 reads; of these:
  28860689 (100.00%) were paired; of these:
    27047041 (93.72%) aligned concordantly 0 times
    1386827 (4.81%) aligned concordantly exactly 1 time
    426821 (1.48%) aligned concordantly >1 times
    ----
    27047041 pairs aligned concordantly 0 times; of these:
      261851 (0.97%) aligned discordantly 1 time
    ----
    26785190 pairs aligned 0 times concordantly or discordantly; of these:
      53570380 mates make up the pairs; of these:
        53426312 (99.73%) aligned 0 times
        31984 (0.06%) aligned exactly 1 time
        112084 (0.21%) aligned >1 times

7.44% overall alignment rate

4.then t tried to set parameters -X to 100 or 700, it turned out the similiar alignment rate,6.12%, 7.44% respectively.

5.I tried to only input read1and read2 to bowtie2 separately, however, it turned out the similiar results, the mapping rate was 7.47%,7.46% respectively.

I applied BWA to the pair-end fastq files with parameters

bwa mem\ -t $threads\ $bwa_index\ $fq1\ $fq2

the alignmnet rate increased to ~18%.
to see whether there was something related to contamination, I aligned the trimmed read1 fastq to human genome hg19, and the alihnment rate is 1.38%. So it was unlikely there was contamination with human DNA.

28860689 reads; of these: 28860689 (100.00%) were unpaired; of these: 28463066 (98.62%) aligned 0 times 339985 (1.18%) aligned exactly 1 time 57638 (0.20%) aligned >1 times 1.38% overall alignment rate.

to get unmapped reads and do blast, I extracted the unmapped reads saved in bam. the following was the commonds.

bowtie2 \ -x ${bowtie2_index} \ -1 ${input_fastq1} \ -2 ${input_fastq2} \ -p ${thread} \ |samtools view \ -h -S \ - \ -bo ${out_dir}/${sample_name}.bam

    samtools view -h ${out_dir}/${sample_name}.bam |grep -v chrM|samtools view -h -S - -bo 
      ${out_dir}/${sample_name}.rm.bam
    samtools view -F4 -h  ${out_dir}/${sample_name}.rm.bam | samtools view -h -S - -bo 
    ${out_dir}/${sample_name}.mapped.bam
    samtools view -f4 -h ${out_dir}/${sample_name}.rm.bam | samtools view -h -S - -bo 
    ${out_dir}/${sample_name}.unmapped.bam

9.I only transferred chr10 from bam to fasta using "samtools fasta" commond and submitted 10 reads to NCBI online blastn, it returned that 2 sequences were predicted to Mus musculus BAC library or ncRNA and 2 were predicted to Oryctolagus cuniculus clone.

May the low alignment rate caused by the library construction? It wiil be helpful if anyone would like to give me some advice. Thank you in advance!

ChIP-Seq bowtie2 BWA low alignment rate • 3.6k views

ADD COMMENT • link 5.2 years ago by Shuang He • 0

0

Entering edit mode

What did you ChIP? Is the antibody known for poor IP results? Did you add anything like spike-ins or carrier DNA? Please give some details on how the library was made. Is this standard ChIP-seq or something special? Is the material primary or cell line?

ADD REPLY • link 5.2 years ago by ATpoint 85k

0

Entering edit mode

Thank you for your reply. H3K4me1,H3K27ac and transctption factor ChIP-seq were performed, they all have low alignment rate. Yes, we added the yeast spike in DNA. The library were made according to CUT&RUN protocol. here is the paper : enter link description here The libraries were prepared using KAPA Hyper Prep kit.The material was cells isolated in vivo.

ADD REPLY • link 5.2 years ago by Shuang He • 0

0

Entering edit mode

I would see what the alignment rate to the yeast genome is. If that doesn't account for the majority of the unaligned reads, the next thing to do would be to check for some kind of non-human DNA contamination, maybe bacterial. Try using the NCBI-blast web tool and blast some of the reads against the "nr" database to see what it hits.

ADD REPLY • link 5.2 years ago by colin.kern ★ 1.1k

0

Entering edit mode

Thank you for your reply. I blast some reads with NCBI-blast, it pointed to the rabbit genome. And I mapped the fastq file to Rabbit genome, it showed ~87% alignment rate. I don't know whether it related to the fact that the host specie of the antibody is Rabbit, by the way, it's monoclonal antibody.

28860689 reads; of these:
  28860689 (100.00%) were paired; of these:
    12216933 (42.33%) aligned concordantly 0 times
    13863221 (48.03%) aligned concordantly exactly 1 time
    2780535 (9.63%) aligned concordantly >1 times
    ----
    12216933 pairs aligned concordantly 0 times; of these:
      7086173 (58.00%) aligned discordantly 1 time
    ----
    5130760 pairs aligned 0 times concordantly or discordantly; of these:
      10261520 mates make up the pairs; of these:
        7481731 (72.91%) aligned 0 times
        415800 (4.05%) aligned exactly 1 time
        2363989 (23.04%) aligned >1 times
87.04% overall alignment rate

ADD REPLY • link 5.2 years ago by Shuang He • 0