Hello everyone:
I am looking for negative data for my classifier. I am trying to find specific enhancer (stat1) in human genome. I want human regions which are not regulatory regions and histone modification associated regions.
I would appreciate if someone suggest me such negative region for hg18?
thanks.
thanks for replying. I have coordinate of specific enhancer (STAT1) for Hela cell. How can I get list of regions which are not regulatory regions? could you explain more about tissue specific approach?
I am trying to identify stat1 regions based on histone marks, but my classifier can't predict well after training.(my neg data is random seq) .thx
if I understand you correctly - you are trying to identify all STAT1 enhancer regions on basis of histone marks - I'm not sure of how is that going to work. Nonetheless, to answer your question,
How can I get list of regions which are not regulatory regions? -- You can use complementBed to get list of all regions that dont overlap with the list of regulatory regions you source from encode/fantom/in-house data etc (https://bedtools.readthedocs.org/en/latest/content/tools/complement.html)
I don't think I understood your objective for the classifier, plus in Hela cells, so what I was saying about comparing it to enhancer regions from other tissues doesn't really hold. But the idea was that if you are building a classifier for enhancer regions in say liver tissue, you might want to use histone signals from the enhancer regions in an entirely different cell type .. say blood cells to get a negative control since we know enhancers are related to cell identity.
Thanks so much and very helpful.
Yes I am trying to identify stat1 regions on basis of histone mark, I am training classifier based on sequence contents of histone marks. my cell line is Hela cells.
Is there any online tool like complementBed to provide list of non-overlapping regulatory regions? this tool working in Linux and OSx machines, I am windows user. I just want non overlapping regions in Hg18.
Thanks again.
If you don't have access to any unix machine, you can try using Galaxy (https://usegalaxy.org/)
could you tell me how can I generate non-overlapping regulatory region in galaxy?
thanks for your help
As I mentioned you can use complementbed in galaxy. What part is not clear? If you opened the link I provided in my answer and browsed through the options on the left hand side.. You would have seen "Operate on Genomic Intervals" under which there is an option of "Complement intervals of dataset". I am happy to help in case you are stuck at some point but it feels like that you did not research this on your own at all. A simple google search would have landed you at https://wiki.galaxyproject.org/Learn/IntervalOperations