which bed file should be used for deeptools computeMatrix
1
0
Entering edit mode
3 months ago
QX ▴ 60

Hi all,

I am trying to make heatmap for ATACseq data using deeptool computeMatrix function. I would like to ask for the --regionsFile option, shall I use the bed file for peaks from peaks calling, or the bed for the gene that was annotated by those peaks?

For TSS region analysis, the bed for gene region should make more sense to me; however, I found that some suggestions may set other region than genes, or peaks themself

Can anyone help me explain this and which options should be used?

atac-seq deeptools • 666 views
ADD COMMENT
0
Entering edit mode
3 months ago
LChart 4.5k

the TSS matrix is generally used only for computing TSSe and displaying the corresponding plots. This is a quality control metric for establishing the extent of open chromatin (which should be present at highly-transcribed TSS) versus random/uniform background.

I can't think of a case where genic beds were used with ATAC-seq.

Peak .bed files are used to compute the count matrix for downstream processing with e.g., SnapATAC. However, Signac and SnapATAC{2} can both run just from a fragment .tsv.gz, which would be better than the summary .bw. If you can avoid driving from the coverage .bw (which is lower resolution) then you should do so.

ADD COMMENT
0
Entering edit mode

sorry it not clear for me, is the TSS the start of the gene, whether it 'open' or not can be captured by ATAC-seq?

I don't know if TSS is the start column in bed file for gene? for e.g. the 2nd column in this genetic bed file? or the start of the peak bed file.

1       23778418        23788232        PITHD1
1       26234200        26277687        CEP85
1       26472440        26476642        HMGN2
1       75724431        75735094        ACADM
1       91949343        91977138        BRDT

for the peak calling, yes I only choose peak bed file for further downstream analysis

ADD REPLY
0
Entering edit mode

The TSS is the "transcription start site" - where polymerase initiates transcription of the gene. When MEDIATOR/Pol2 has assembled at the promoter and initiated transcription, it tends to preclude occupancy of other proteins, meaning that these positions should be accessible ("open" chromatin) for genes that are highly transcribed.

Genes can be transcribed in the (+) and (-) direction - so even if your bed file lovingly selected the appropriate transcript for each gene, you don't know which of those has the TSS at the start, and which have the TSS at the end. So the bed file you have simply isn't sufficient.

You haven't specified what you want to accomplish with the outputs of computeMatrix, so it's not clear whether answering these questions is getting you any closer to your ultimate goal.

ADD REPLY
0
Entering edit mode

Hi, true. I didn't think of that before. Thank you

I just try to do some exploratory data analysis without specfic purpose. Do you have any suggestion for next step after peak calling and DEseq2?, it would highly appreciate!

ADD REPLY
2
Entering edit mode

I'll plug Ming Tommy Tang who put together this: https://github.com/crazyhottommy/ChIP-seq-analysis - it's for ChIP-seq, but most of this will apply to ATAC. The one thing that's really missing (meaning a direct link to the tool rather than just a paper) which would be ATAC-specific is applying activity-by-contact (you need Hi-C contacts from the same or related cell/tissues for this).

ADD REPLY
1
Entering edit mode

Thanks! If you want to learn to stay in R and compute the matrix and visualize in line plot or heatmap, read here https://divingintogeneticsandgenomics.com/publication/2017-08-01-biostarhandbook/

ADD REPLY

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6