I'm new to bioinfo and utilizing chromHMM, I'm feeling little bit lost on how to annotate the 12 state combination which I got from 5 marks im checking in 4 cell types, I came across github https://github.com/ernstlab/full_stack_ChromHMM_annotations, here they have provided a bed file for annotation. Any one has any idea how to use it?
What you have found their is not a way to annotate the states you got from your run of chromHMM, but the output of their run of chromHMM using the histone mark datasets from roadmap epigenomics. You would use this instead of the results of your own experiment if the cell types (or similar) where included in the ensetlab run.
Annotating the 15 states you get out of chromHMM is as much art as it is science. You need to use a combination of enrichment for certain genome features, along with your biological knowledge about the function of the marks you surveyed.
For example, states that largely overlap with annotated expressed genes are likely to be transcribed states. States that overlap with the start of genes, and have high H3K4me3 are likely to be promoter states. States with high H3K4me1 or 3 and/or H3K27ac but don't overlap with the start sites of annotated genes are more likely to be enhancer states, and so on.
Thank you for your reply, so basically by doing thorough literature survey I have to manually annotate the states that i found in my emissions heatmap. Their provided bed file is only of use if I have same cell types, as they have used for their study? This is what you meant right? And some of the data that im utilizing for chromHMM has been taken from Roadmap epigenomics project only(hESC, NPC data). So can i use their bed file and if so how can I use it technically? Any answer will be great help.
The data set you have linked bascially processed all the data together to produce a 100 state model. While you could use that, its probably not the best thing to do.
When unzipped (tar -xvf all.mnemonics.bedFiles.tgz), this will give you one file per cell type. Each file will have 4 columns: Chromosome, Start, Stop, State, so if the file says:
chr1 100000 100100 Enh
It means that chromHMM predicts an Enhancer between bases 100000 and 100100 on crhomosome 1.
Okay,thanks ian for your inputs. I appreciate you replying to my query. I'll have a look at the links you have suggested and see if I can figure out something. Thank you. I'll try to plot using these bed files.
Thank you for your reply, so basically by doing thorough literature survey I have to manually annotate the states that i found in my emissions heatmap. Their provided bed file is only of use if I have same cell types, as they have used for their study? This is what you meant right? And some of the data that im utilizing for chromHMM has been taken from Roadmap epigenomics project only(hESC, NPC data). So can i use their bed file and if so how can I use it technically? Any answer will be great help.
The data set you have linked bascially processed all the data together to produce a 100 state model. While you could use that, its probably not the best thing to do.
Instead, I'd look at this data set here: https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html
They have links to a variety of files, including here; https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/all.mnemonics.bedFiles.tgz
When unzipped (
tar -xvf all.mnemonics.bedFiles.tgz
), this will give you one file per cell type. Each file will have 4 columns: Chromosome, Start, Stop, State, so if the file says:It means that chromHMM predicts an Enhancer between bases 100000 and 100100 on crhomosome 1.
Okay,thanks ian for your inputs. I appreciate you replying to my query. I'll have a look at the links you have suggested and see if I can figure out something. Thank you. I'll try to plot using these bed files.