Question

NCBI: how to obtain genomic context for arbitrary annotation release?

0

Entering edit mode

9.5 years ago

tsukanoffkirill ▴ 20

So I have a list of several hundred human gene IDs, one per line, and I need to determine for each of them whether they transcribe from direct chain (in respect to the reference sequence of GRCh37.p13) or from reverse complement chain.

I submitted this list to Batch Entrez, selected "Tabular (text)" format in Display Settings, and there indeed was an "orientation" column in the output, reading "plus" or "minus", respectively. There is however one problem: it always shows orientation for the most recent annotation release (currently release 107 for genome assembly GRCh38.p2), while I need information from annotation release 105 (GRCh37.p13). This is important because, just for example, NIPA1 gene is on direct chain for release #107, but on reverse complement chain for #105, and examples like that are plentiful. I also tried to add AND GRCh37.p13[Assembly Name] to my search string, but it seems to affect nothing, because in this "Tabular (text)" view it still shows orientation from the latest annotation release.

Can anyone please explain what do I do this situation? It doesn't have to be limited to Entrez only, I can write a parser script or even a web scraper if this would be required to do what I'm trying to do.

ncbi gene • 2.2k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.5 years ago by tsukanoffkirill ▴ 20

Ram · Accepted Answer · 2015-05-29

3

Entering edit mode

9.5 years ago

Devon Ryan 104k

Just use biomart, it has a separate database for GRCh37. You're looking for the "strand" column, where biomart will return 1 and -1 for + and -, respectively.

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.5 years ago by Devon Ryan 104k

1

Entering edit mode

I'm newer to sequence analysis, so thanks for pointing me to BioMart, seems like a great tool that's also easy to use. Poking around I found they also have a great timeline of genome assemblies to get an idea of the appropriate versions to select.

ADD REPLY • link updated 22 months ago by Ram 44k • written 9.5 years ago by ciclistadan ▴ 30

0

Entering edit mode

Yeah, you find BioMart and Ensembl to be your go to resources for most things. I use both of those far more than NCBI.