Question

How to match gene name of two dataset with different annotation version ?

0

Entering edit mode

7.2 years ago

Wayne Lee ▴ 10

Hello everyone!

If I have two gene expression dataset with different gene annotation version, how to match gene name between this two dataset?

Thanks, Wayne

RNA-Seq gene annotation • 2.1k views

ADD COMMENT • link updated 7.1 years ago by h.mon 35k • written 7.2 years ago by Wayne Lee ▴ 10

0

Entering edit mode

Please post some example data from both datasets and example output required.

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

For instance, there have two gene expression datasets, the row represent each gene and the column is sample, now I want to merge this two datasets according to the row. As they use different gene annotation version, result in there have some rows mismatch, i.e. some gene name in dataset 1 not be contained in dataset 2, vice versa. So I want to know how to deal with this mismatch gene? Just remove them or do some process to reduce information loss?

Thanks

ADD REPLY • link 7.1 years ago by Wayne Lee ▴ 10

0

Entering edit mode

Please be more specific. What are the genome versions? What are the annotation versions? Where did you obtain this count data?

ADD REPLY • link 7.1 years ago by h.mon 35k

0

Entering edit mode

Wayne Lee : One additional thing is to consider if the sets came from two different experimental techniques/origins. You should not merge them as is in that case.

ADD REPLY • link 7.1 years ago by GenoMax 154k

score 0 · Answer 1 · 2018-10-07

It is more complex than just matching gene names between versions. Even for the same underlying genome version, genes can be retired, there can be new genes, and some gene versions may have different genomic coordinates. Ideally, all datasets should be quantified using the same genome and annotation versions.