Entering edit mode
14 months ago
pb11
▴
30
Hello everyone
I am in need of some tips to analyze and make heatmaps as shown in this figure C and G. I already have downloaded the repeatmasker list. I want to compare my chromatin data with repeat elements and see their enrichment.
I am not sure how to consider the summit point, since repeat elements are spread all over the genome and may not be possible to capture in the right format. Any help/suggestion will be grateful.
Thanks
PB
Can you elaborate on exactly the question you are trying to answer with a figure? As ATpoint mentioned, assuming you've taken an alignment approach that survives the repeat mapping problem, what are you comparing to what? Chromatin data in heatmaps is often defined around a biological feature. Do you have a biological feature of interest? Is a functional biological mark relevant to your question? What kinds of repeats are you referring to (there are many many different kinds, with different characteristics). Is the data in the figure above just an example of a heatmap, or is this figure specifically relevant to your analysis? Is so, what paper is it from? When you say "compare my chromatin data" I would ask how many questions do you have? State each one in a sentence, and go from there.
Yes, we are looking from enhancer perspective
Yes, we see enrichment of enhancer H3K27ac marks on ERV loci's. What kinds of repeats are you referring to (there are many many different kinds, with different characteristics). LINE, SINES, ERVs
Is so, what paper is it from? You can see the paper here. https://academic.oup.com/nar/article/51/10/4745/7067945
I have lot of questions :P.
Yeah, I get that. I was wondering how to make these plots. I have made similar plots for genes, I am not sure how to make for repeat elements, considering they are spread randomly.
Use your chip-seq (or atac-seq) peaks that are enriched with TE/repeats you are interested in.
Basic way to do that is to take your peaks and use bedtools to keep the ones that are overlapped by the repeatmasker annotations. You could use other "enriched" thresholds as well, but overall you will be left with your peaks enriched with repeat elements. The repeat element analysis is usually repeatmasker based.
I'm not sure if there was something else you thought was different about this analysis from what you've done before?
I tried that method and I am unable to capture that. I see in my RNAseq that repeat elements are enriched and we I see the chipseq marks over them I can see they are bounded by H3K27ac. I just want to build aggregate plots and heatmaps to see them.
Unable to capture what?
If you see you have H3K27ac over repeats, then when you derive the H3K27ac peaks that are overlapped by the repeat you want, you will see the heatmap/metaplot of enrichment. This must be true since you are pre-defining H3K27ac peaks.
You should check the methods. It looks like they may have centered on peaks that are overlapped with repeat element, or have centered on the LTR regions. I'm guessing the former since they show an internal region, and I can't imagine you get such nice k27ac enhancer peaks by basing the summit of any sort of annotation and not the peaks. I wouldn't be surprised if they even specifically identify certain subsets of peaks.
When doing your own repeat analysis, I think it's important to question your assumptions. For example, the obvious ones are if you keep multimappers and you see signal over an element, it may or may not be real. Likewise, if you only keep unique mappers, then signal will be depleted. What does that mean for your comparisions?
I like the idea of keeping one multimapper randomly (Default bowtie2 behavior I believe) for TE analysis. Just be careful, because if you subset a family of TEs to compare them, then that may not be representative. Also, it's hard to compare between different TEs. For example, LTR8 and MER50 show different k9me3 signature, but that might be due to the nature of the repeats rather than a real event. Comparing the same TE family between conditions should be okay though, just keep in mind to keep your analysis more global since individual loci may be falsely mapped, but those reads would be expected to stay within the family, if that makes any sense....
Another consideration would be to analyze the flanks of your repetitive regions since chromatin may spread out into uniquely mapped region.
Also it's not clear what regions are represented in G? but it looks like they identified MER50 regions shared or specific to cell types, whats the enrichment in E and F?
As one more consideration, what if in G, the signal over those hTSC specific regions is actually just from one highly enrichement element, which then falsely gets distrbuted to all the other identical elements (it could be thousands!)
From the figure captions, it is clear they center their data on their enhancer regions