Question

Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?

0

Entering edit mode

11.8 years ago

PoGibas 5.1k

I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.

Question:

To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?

Example:

Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:

GENCODE lncRNA annotation (version 18 - 04/09/2013);
Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).

Workflow:

Extract GENCODE genes start/end coordinates;
Extract Cabili genes start/end coordinates;
Extend Cabili coordinates ( -/+ nbp );
Use BedTools intersect;
If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).

I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.

bedtools merge • 4.3k views

ADD COMMENT • link updated 10.4 years ago by Biostar 20 • written 11.8 years ago by PoGibas 5.1k

0

Entering edit mode

What do you plan on doing with this dataset?

ADD REPLY • link 11.8 years ago by Damian Kao 16k

0

Entering edit mode

I updated my question: "in the following steps it will be used to search for genomic features"

ADD REPLY • link 11.8 years ago by PoGibas 5.1k

1

Entering edit mode

You should think about what you exactly will want to do with these features. For RNA-seq? For wetlab (primers/probes..)? For phylogenetic studies? Your strategy of how you want to merge the features might be different for these purposes. There probably isn't one single method of merging these annotations that will be good for all purposes.

ADD REPLY • link 11.8 years ago by Damian Kao 16k

0

Entering edit mode

This should be simply enrichment analysis for any feature (e.g., sequence motif, chromatin modification, repeat count).

ADD REPLY • link 11.8 years ago by PoGibas 5.1k

score 1 · Answer 1 · 2013-10-13

My first instinct is that arbitrarily extending coordinates to try to resolves differences between two annotations is a dangerous practice. You wouldn't want to accidently combine two nearby features of no related function just because of their proximity. I'm not sure what kind of organism you are working with, but there are such things as annotation combiners that are specifically designed to use various forms of evidence from several programs to build a final, comprehensive annotation. JIGSAW comes to mind, and a quick websearch found this link, but you should search for other combiners to fit your need. For example, I think JIGSAW is only for eukaryotes, while something like GenePRIMP is only for prokaryotes.