Question

Convert list of rs IDs from one format to another

0

Entering edit mode

6.7 years ago

Mr Locuace ▴ 180

Hello, I downloaded a huge list of eQTLs from the GTEx portal:

https://www.gtexportal.org/home/datasets

There is a column with the rs IDs with the format: "1_115746_C_T_b37". I would like to convert them to rs IDs of the form "rs1234567"? (these are just examples). In the documentation it says these IDs are RS IDs from dbSNP 147

Thanks !

SNP identifiers eQTL GTEx • 2.8k views

ADD COMMENT • link updated 6.7 years ago by finswimmer 16k • written 6.7 years ago by Mr Locuace ▴ 180

0

Entering edit mode

Can you give context on the IDs? How did you get them? What's the biological context?

ADD REPLY • link 6.7 years ago by Hussain Ather ▴ 990

0

Entering edit mode

Hi Hussain Ather, I just edited my post

ADD REPLY • link 6.7 years ago by Mr Locuace ▴ 180

score 2 · Accepted Answer · 2018-05-05

2

Entering edit mode

6.7 years ago

finswimmer 16k

Hello Mr Locuace,

this "ID" looks to me like it describes:

the chromosome
position
reference allele
alternative allele
reference genome

So to get the corresponding rs id, one solution is to extract the informations of the id and convert it to a vcf file. You can then annotate the ID column. There was a similar thread some time ago.

Let's start to create a vcf file. I assume you have a file id.txt which contain your id's in every line:

awk -F_ -v OFS="\t" '{print $1,$2,".",$3,$4,".",".","."}' id.txt|sort -k1,1V -k2,2g > ids.vcf

Now you can take this vcf file and annotate it with e.g. SnpSift:

java -jar SnpSift.jar annotate -id dbSNP.vcf.gz  ids.vcf > ids_annotated.vcf

Of course first you have to download the dbSnp file.

fin swimmer

ADD COMMENT • link 6.7 years ago by finswimmer 16k

1

Entering edit mode

dbSNP release must be based on b37/GRCh37 build.

ADD REPLY • link 6.7 years ago by cpad0112 21k

0

Entering edit mode

Vielen Dank finswimmer !! ;)

ADD REPLY • link 6.7 years ago by Mr Locuace ▴ 180