I am analyzing ATAC-seq data and right now trying to confirm my reads enriched around TSS sites however I am obtaining enrichment between TSS and TES for all my samples in two cell lines.
The code I run:
computeMatrix reference-point \
-S m7_rep1.bw \
-R hg38.TSS.bed \
--beforeRegionStartLength 1000 \
--regionBodyLength 2000 \
--afterRegionStartLength 1000 \
--binSize 100 \
-o m7_rep1_matrix.gz
plotHeatmap \
-m m7_rep1_matrix.gz \
-out m7_rep1_TSS_enrichment_heatmap.png
I generated the hg38.TSS.bed
awk '$3 == "transcript"
{
if ($7 == "+") print $1 "\t" $4-1000 "\t" $4+1000;
else if ($7 == "-") print $1 "\t" $5-1000 "\t" $5+1000;}
' hg38.refGene.gtf > hg38.TSS.bed
I am also not sure about the black genes.
What could be the thing I am doing or is going wrong?
I appreciate any help, thank you
To troubleshoot, maybe try separating the transcripts by strand, instead of mixing the two. In your awk statement, for instance, write one or the other strand if-else case and visualize that. This may help highlight the problem.
I concluded that I initially made the wrong configuration of TSSs initially; thank you so much for your directive reply.