Entering edit mode
4.7 years ago
BrunoGiotti
▴
120
Hello!
As per title, does anybody know any software/package for calling CNVs using ATAC-seq data?
Much appreciated
Hello!
As per title, does anybody know any software/package for calling CNVs using ATAC-seq data?
Much appreciated
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you mean correcting ATAC-seq peaks count by Copy Number ?
no, I mean inferring CNVs from fragments abundance. Of course that would be confined to the regulatory regions but it should still work
It is difficult enough to infer CNVs reliably with good WGS data so the algorithm can try to build a background on a large number of windows. You can see this on the fact that there are dozens if not hundreds of different CNV tools out there. I would not trust any results from ATAC-seq/ChIP-seq on this regard. Coverage in these is punctual since they assay like 1-5% of the genome in many cases, coverage is highly uneven both within and outside of peaks, and their purpose is simply not to call CNVs.
I see, thanks for the comment. However I still would like to try, here a paper where they do this kind of analysis using scATAC: https://www.biorxiv.org/content/10.1101/610550v1.full.pdf
So they do it 'manually' but i was wondering if there is some package out there to do the job.
You will always find papers that did all kinds of fancy things. I personally always check if they did validation. In this paper they only very briefly discuss the whole CNV aspect and it does not seem they really did an in-depth analysis on the false-positive and negative rate of the method. They state tat there was some agreement with known CNVs but maybe they also found plenty of false-positives while missing many true events. The method section is also short and without details, no apparent statistics etc. Decide for yourself if you think it is robust enough to invest time into understanding the method plus implementing it. Also decide if you have the possibility to really validate whatever results this is going to produce. In genome-wide analysis you often have significances but if these findings are biologically-meaningful is a whole different topic.
I agree with you, that is why i was looking for a package which i could plug in and see if results make sense in a short amount of time. I already know which are the transformed cell populations and inferCNV on matching scRNA-seq supports that. I appreciate your concerns and comments but my intention here once again was to find out if there is such a package. But anyway, I'll see if i can implement their method.
This is from the supplementary text (page-15) of TCGA ATAC-seq paper by Corces et al. Science 2018
"ATAC-seq data analysis – Inferring copy number amplification To infer DNA copy number amplifications from ATAC-seq data, we first tiled the genome into 2-Mbp windows using “tile” of genomic ranges for chromosome sizes in R. These window positions were then filtered against regions with known artefactual mapping issues using the ENCODE blacklist with the “setdiff” function in R. Then, the number of insertions within each filtered window was determined using “countOverlaps”. Next, the insertions per bp was determined within each filtered 2-Mbp window. Then, the percent GC content was computed for each filtered 2-Mbp window using the hg38 BSgenome in R. To estimate if a region is amplified, for each window we took the 100 nearest neighbors based on GC content and computed the average log2(fold change). If this was above 1.5 we considered this region as a candidate for amplification. This window size best captured smaller known amplifications, but added more false positives compared to 10-Mbp windows."
Yes, this is the method section we discussed above ;-) Does not sound too convincing to me, probably lots of false calls, but it is also not the focus of the paper so and rather a gimmick, so they got away with it.