motif search with ATAC-seq
4
0
Entering edit mode
6.2 years ago

I have ATAC-seq data for two yeast species. I have called peaks with MACS2 and did occupancy and affinity analysis of peaks with DiffBind. Now I need to find motifs of TF binding sites in the peaks and compare those motifs between two species.

There is a ton of software and databases for doing the task, so I am a bit confused on how to start with the analysis. Can anybody share the experience with motif search and motif comparisons, specifically which tools are considered as "best practices" in the field?

Thanks

motif-search ATAC-seq • 9.7k views
ADD COMMENT
0
Entering edit mode

What exactly are you confused about? You imply you've already done some research on how to do this, so it's difficult to figure out how we could help you. Have you tried any of the "tons of softs and databases" and found them lacking?

ADD REPLY
4
Entering edit mode
6.2 years ago

There are basically 2 genres of tools that one can use: those that search for motifs in peaks and those that do footprinting. The former group are well represented by homer and the meme suite (MEME-ChIP in particular). The latter group is mostly represented by wellington. I'm generally not a fan of footprinting with ATAC-seq data, the coverage needed to do it properly is just absurdly high. Given that, one of homer/meme/etc. would be my preference. I generally find homer to be annoying to use, so I personally prefer MEME, but that's more of a personal preference than a best practice.

ADD COMMENT
0
Entering edit mode

Thanks for input Devon! Regarding footprinting, do you think its fine to merge the ATACseq replicates to increase the depth (replicates are very reproducible)? Regarding the motifs, say I have found the motifs in the peaks, what kind of comparison between motifs of different species do you think can be done? Sequence comparison, copy number comparison, others? I am new to this field and want to conceptually understand what makes sense and what does not. Thank you

ADD REPLY
0
Entering edit mode

Sure, you can merge replicates. Regarding the comparisons that make sense, that depends on the biological question. Note that comparing across species is rife with issues.

ADD REPLY
1
Entering edit mode

I apologize for interrupting another tread, but I would usually encourage the use of replicates.

While Devon is right that the number of reads per ATAC-Seq sample can sometimes be high, if you already have replicates with high-coverage samples, I think you should take advantage of that.

Also, with yeast, getting high coverage should be less of an issue than an organism with a larger genome, such as human or mouse (I am assuming you are studying a species of yeast with limited introns, but I admittedly don't know how the largest yeast genome compares to a vertebrate genome).

ADD REPLY
0
Entering edit mode

Hi Charles, the genome size is ca 12Mb and the ATACseq reps have around 20mln reads each (the coverage is much higher compared to human genome, though I don't really get what coverage in ATACseq context means). Regarding the replicates, for example I have used them in occupancy analysis with DiffBind, and most of the peaks are shared between the replicates. So I can try maybe merging the reps and focus only on those common peaks.

ADD REPLY
0
Entering edit mode

It's not uncommon to try removing duplicate reads before peak calling with ATAC-Seq. If you do this, the unique read coverage can be considerably lower than the original coverage (which is what I think Devon was talking about in the original comment).

However, if you are getting reasonable results with your strategy using replicates with your current strategy, I think that is OK (and arguably what matters most - as long as you have some way to biologically assess your results). Knowing about possible strategies for troubleshooting (such as removing or keeping duplicate reads, using counts for reads from programs like htseq-count/featureCounts for DESeq2/edgeR/limma-voom, etc.), is probably not a bad idea (and should allow you to be more comfortable when responding to reviewers). However, you may find that some strategies work better for your particular dataset than others; what works best for your data may not be 100% identical to what is most popular (if you are able to define that), but having some novelty in your analysis strategy should also likely add significance your paper for higher-impact publications :)

ADD REPLY
0
Entering edit mode

Note that comparing across species is rife with issues.

Yes, especially when they have 30% of genome divergence:)

ADD REPLY
3
Entering edit mode
6.2 years ago
ATpoint 86k

Depending on what your exact question is, you might consider chromVAR. It takes as input a set of peaks, e.g. the combined peak sets of your two species, and the aligned BAM files to infer differential motif accessability. In the end, you'll get a list of motifs that are differentially accessable in either condition. chromVAR does that by computing a variability score for each motif. For this, it first matches a set of motifs, e.g. from JASPAR to the peaks and then checks if regions with a certain motif are more or less accessible in condition1 vs. condition2. Even though it may primarily been developed for single-cell ATAC-seq, I had some good success so far with it on bulk ATAC-seq data, producing results that made biological sense and were supported by other experiments.

ADD COMMENT
0
Entering edit mode

Thanks, the software looks promising.

ADD REPLY
0
Entering edit mode

Hi ATpoin, peaks from ATAC-seq data or DNase-seq data may contain several footprint site, each with a potential motif. If we use the sequence centered in summit of the peak, will this hamper the discovery of each motif in MEME-chip analysis? At least centrimo may be affected. Or should we first characterise each footprint, and then retrieve ~300bp around the each footprint center and then do the MEME analysis?

Thank you!

Aifu.

ADD REPLY
0
Entering edit mode

When analyse bulk ATAC-seq data(like two different tissues), any patameters to pay attention to? Thank you.

ADD REPLY
0
Entering edit mode

Sorry for the late reply. You can have a look at my basic script for pairwise comparisons at Github.

ADD REPLY
0
Entering edit mode

Hi ATpoint, could you share your basic script for pairwise comparisons, the link at Github is not working.

Thanks!

ADD REPLY
1
Entering edit mode
6.2 years ago

You might want to take a look at this post: A: How can I find motifs under individual ATAC-peaks?

However, i-cisTarget doesn't have yeast annotations (at least as far as I can tell).

For general peak enrichment, I think the species is also a limitation for Broad-Enrich or GREAT, but perhaps you can look at citations for papers (or the papers themselves) to get some other ideas (but maybe that is a little off target from your motif question).

ADD COMMENT
0
Entering edit mode

Thank you Charles, very useful! One of the species I analyze is non-model, so I guess indeed there will be limitations. So either I will need to do motif discovery, or search based S. cerevisiae motifs.

ADD REPLY
1
Entering edit mode
6.2 years ago
afli ▴ 190

Hi, you can try HINT(http://www.regulatory-genomics.org/hint/tutorial/). It can do the comparation, and works well for me.

ADD COMMENT
0
Entering edit mode

Thanks for contributing!

ADD REPLY
0
Entering edit mode

Unfortunately HINT works only for some vertebrates.

ADD REPLY
2
Entering edit mode

You can download JASPAR motif, and do some modification, then you can do the analysis for other species, see this link https://groups.google.com/forum/#!category-topic/rgtusers/general-discussion--rgt-core-classes/6ioEaNXEeeA.

Hope this help!

Aifu.

ADD REPLY
0
Entering edit mode

Thanks for pointing to this!

ADD REPLY
0
Entering edit mode

Hi Aifu, I am wondering which mapping software have you used for your data?

ADD REPLY
0
Entering edit mode

I use bowtie2 to map reads to the genome. Sorry for my late reply.

ADD REPLY

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6