Question

Tool:Annotation of noncoding variants and transcription factor motifs

1

Entering edit mode

5.9 years ago

husensofteng ▴ 410

funMotifs is a web-based tool designed for annotating noncoding variants and genomic regions

We have collected ChIP-seq, DNase-I, and other assays from ENCODE, FANTOM, RoadMap epigenomics and other data sources. We have used the data to annotate motifs of 510 transcription factors in 14 tissue types.

You can upload a list of variants or genomic coordinates and the tool will report the overlapping TF motifs and their annotations in a selected tissue type. A typical use-case is to annotate mutations from a certain cancer type with annotations from a corresponding tissue type. In order to summarize the annotations, we have applied a logistic regression model that enables prioritization of the variants and motifs.

The tool is open source and the pipeline is implemented in Python (source on GitHub). PostgreSQL is used to store and index the data that allows for very quick annotation retrieval.

The pipeline allows for re-generation of the annotations on a local computer to annotate larger sets of variants using a programming interface. Your comments and suggestions are appreciated.

Here is a link to the website: http://bioinf.icm.uu.se/funmotifs/

TF-Motifs regulatory noncoding-variants • 1.5k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 5.9 years ago by husensofteng ▴ 410

1

Entering edit mode

Hi husensofteng , please allow me to make a couple of comments that are not meant to be impolite but rather to improve the overall appearance of the linked content:

In the ABOUT section of the tool, there are at least three typos only in the first paragraph (fcator, futher, variatns)
Some links are broken in the Manual section
the statement HiC contacting domains from GTEx and ENCODE is misleading because GTEx does not contain Hi-C data (to my best knowledge, maybe I am wrong) and the linked paper (link is broken) refers to Rao 2014 which I think is not associated with GTEx but rather ENCODE
also quiet many typos on the MANUAL page
which genome version is expected? I assume hg19. Do you support hg38?
as a suggestion you might consider incorporating ATAC-seq data which is getting more and more prominent in comparison to DNaseI-seq

Again, please consider the comments a suggestion to improve your project. The thing is that there are already a number of tools to predict the impact of nc-variants, e.g. FunSeq2, and people need motivation to choose your approach over the other available ones. Currently, due to the above points I would be reluctant to choose your tool, especially because of the lack of hg38 support (correct me if you support this version).

ADD REPLY • link 5.8 years ago by ATpoint 85k

1

Entering edit mode

Thank you so much ATpoint for these valuable points. I have gone through them all and I have tried to fix the errors. I just learned that it's possible to do spell corrections directly in vim (:set spell spelllang=en_us) :)

We hope to manage rerunning the pipeline on hg38 and incorporating the more recent datasets that have been generated as well as ATAC-seq experiments. This would give us the chance to add more tissue types too.

I think the main motivation for using our tool, in comparison to the pre-existing ones, is in the tissue-type specific scoring system that provides a summary score for over 80m motifs across 15 tissue types. I will get back to you as soon as I have some news on the hg38 and inclusion of more assay types.

I would be glad to receive your further suggestions and thanks for the comments.

ADD REPLY • link 5.7 years ago by husensofteng ▴ 410