Hi new user here although I've been lurking trying to find answers to my issue!
I'm relatively new to bioinformatics but have been working on analyzing data from an ATAC-Seq experiments in Arabidopsis.
I'm having issues with my data not piling up around TSS like ATAC-Seq data should.... I've tried everything I can and asked 4 colleagues and no one has any idea what happened to my data.
Here's the basic pipeline for my analysis:
Trimming reads:
bbduk.sh in=read1.fq.gz in2=read2.fq.gz out=Read1.trimmed.fq.gz out2=Read2.fq.gz ftr=49 tossbrokenreads
Result:
Input: 71468752 reads 10720312800 bases.
FTrimmed: 71468752 reads (100.00%) 7146875200 bases (66.67%)
Total Removed: 0 reads (0.00%) 7146875200 bases (66.67%)
Result: 71468752 reads (100.00%) 3573437600 bases (33.33%)
Alignment:
./bowtie -X 1000 -t -p 10 -v 2 -m 3 -y -a tair10 -1 Read1.trimmed.fq.gz -2 Read2.trimmed.fq.gz -S aligned.sam > aligned.txt
Result:
reads processed: 44774133
reads with at least one reported alignment: 33483782 (74.78%)
reads that failed to align: 11290351 (25.22%)
reads with alignments suppressed due to -m: 2648269 (5.91%)
Reported 41506698 paired-end alignments
Sorted, removed chloroplastic and mitochondrial reads, and then removed PCR duplicates with Samtools
Converted .bam files to .bed files with bamToBed
Shifted reads for Tn5 integration sites:
awk 'BEGIN {OFS = "\t"} ; {if ($6 == "+") print $1, $2 + 4, $2 + 5, $6; else print $1, $3 - 6, $3 - 5, $6}' FILE.rmdup.bed > FILE.Tn5.bed
Called peaks using MACS2:
macs2 callpeak -t FILE.Tn5.bed -f BED -g 150000000 --nomodel --keep-dup all --extsize 150 --shift -75 --qvalue 0.01 -n FILE --outdir /macsoutput
I've included a picture of the TSS graph generated using ChipSeeker in R.
Does anyone have ANY idea why this happened? It looks like what I would expect from RNA Seq data...
I'm definitely no ATAC-seq expert, but I'm not familiar with the "shifting" step, what is the purpose of this step? I've always had pretty good luck just following this pipeline from ENCODE:
https://www.encodeproject.org/documents/0eb389f9-d23d-4053-b25b-1e2826ee5a86/@@download/attachment/ATACpipelineV7.pdf
The ends of the reads are where Tn5 bound chromatin and inserted sequencing adapters, giving rise to the observed sequences. Thus, sequencing reads flank accessible chromatin. Tn5 binds 9 bp of DNA. That’s why we adjust the coordinates +4 for positive strand reads, and -5 for negative strand reads.