Hi.
I'm doing ATAC-seq analysis of colon tissue.
I analyzed 1)QC -> 2)Mapping -> 3)Post alignment processing(remove mt reads, duplicated reads, multi-mapped reads) -> 4)Peak calling order.
However, as a result of calculating FRiP after peak calling using MACS2, the FRiP score was too low.
No major problems were found in the results of each process until peak calling. After post alignment processing, a total of 50,212,020 reads were mapped to the bam file(paired-end sequencing).
Below is the code I used for peak calling and FRiP calculation.
macs2 callpeak -t Input.bam --outdir Peak -f BAM -n ID -g hs -q 0.05 --nomodel --shift -100 --extsize 200 --keep-dup all
awk 'OFS="\t" {print $1"-"$2+1"-"$3, $1, $2+1, $3, "."}' Peak/ID_peaks.narrowPeak > ID_peaks.saf
featureCounts -p -a ID_peaks.saf -F SAF -o ID_fcount.txt Input.bam
My FRiP score is 0.037(3.7%) The result was the same when I calculated the FRiP with samtools rather than featureCounts.
Can anyone tell me what the problem is?
Thanks.
You might want to bring up your ATAC-seq data and peaks in a genome browser (like IGV) for a quick eye test. Sometimes it's pretty obvious whether ATAC-seq worked or not just by looking at the signal near genes.
Thank you for the quick reply!
So, are you saying that even if FRiP is too low, it doesn't matter if a signal(peak) is observed near the gene?
FRiP is just the number of reads overlapping peaks divided by the total number of peaks. It does not tell anything about the distribution or position of peaks.
That's why I'm worried.
I think that even if a signal is observed near the genes, there must have been a problem in the experimental process because FRiP is too low.
There is no rule that signal must come from "near genes" that I am aware of. Most peaks will be intergenic, many even far away from genes.