Hi all,
I am processing the ATACseq datasets recently, and I've done fastQC before and after adapter trimming using cutadapt v2.5. The ATACseq datasets are downloaded form GEO (GSE119453 ) with 75bp pair-end sequence. The fastqc report shows that the read length is 76 for R1 or R2 before cutadapt running. However the fastqc report shows that the length distribution is 0-76 and the density is wired after trimming ( See shared images of one example of report before and after cutadapt for R1).
I am not sure is there any problem with the adapter removal step. Is there any error for the adapter sequence I pass to cutadapt?
Length before cutadapt:
adapter detect before cutadapt:
Length after cutadapt:
adapter not detect after cutadapt:
The cutadapt options I used:
cutadapt \
-a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC \
-A CTGTCTCTTATACACATCTGACGCTGCCGACGA \
-o /n/scratch2/yy220/downloaded/ATAC_seq_datasets/2_cudadapt/${prefix}_R1.fastq.gz \
-p /n/scratch2/yy220/downloaded/ATAC_seq_datasets/2_cudadapt/${prefix}_R2.fastq.gz \
${prefix}_OTHER_1.fastq.gz ${prefix}_OTHER_2.fastq.gz \
--cores=20 \
--quality-cutoff 10 \
-m 20 \
--pair-filter=both
The first 800 reads which including my adapter for R1:
zcat SRR7784432_GSM3374850_Myeloid_dendritic_cells_sample_1_Homo_sapiens_ATAC-seq_1.fastq.gz | head -n 800 | grep -E CTGTCTCTTATACACATCTCCGAGCCCACGAGAC`
Only four reads are detected:
GCCCCTCCTAGTGGTCTCCATGCTCCCCTCTCATGACCCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAA
GTGAGAAACGGAGCAGGAGAGCAGGGGGGGAGGCCCCAGACCTGTCTCTTATACACATCTCCGAGCCCACGAGACT
GTCTCAGCTCACTACAACCTCCCCCTCCCGGCTTCAGGCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAA
CAGTAGATATCCTTAAACCCATAGTAAGTTCCATAACCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGG
The sequence of Nextera adapter from illumina website
This is normal and expected. Fragment length in ATAC-seq is unequal by the nature of the experiment.
But why there is a peak at 75 bp after cutadapt, that seems many of the reads were not trimmed by cutadapt.
Which again is a fine result. That means your reads don't have any adapter or other contamination that you scanned/trimmed for.
Because these fragments are long enough so that at 75bp read length no adapter content is picked up. This is a totally fine result, it always looks like that in ATAC-seq, I processed dozens of these over the last years.
Thank you for your help @ATpoint @genomax, It's quite a relief to know that there is no problem with this step. And another question is what alignment tools should I select since the reads length is variable (<50, and > 50 ). I know that BWA Bowtie1 are more sensitive to short reads less than 50 bp, and Bowite2 more to reads greater than 50bp. But how about this situation? There are about 20% reads are less than 50bp.
Use bowtie2 for everything, it will manage just fine.