Convert list of rs IDs from one format to another
1
0
Entering edit mode
6.6 years ago
Mr Locuace ▴ 180

Hello, I downloaded a huge list of eQTLs from the GTEx portal:

https://www.gtexportal.org/home/datasets

There is a column with the rs IDs with the format: "1_115746_C_T_b37". I would like to convert them to rs IDs of the form "rs1234567"? (these are just examples). In the documentation it says these IDs are RS IDs from dbSNP 147

Thanks !

SNP identifiers eQTL GTEx • 2.8k views
ADD COMMENT
0
Entering edit mode

Can you give context on the IDs? How did you get them? What's the biological context?

ADD REPLY
0
Entering edit mode

Hi Hussain Ather, I just edited my post

ADD REPLY
2
Entering edit mode
6.6 years ago

Hello Mr Locuace,

this "ID" looks to me like it describes:

  • the chromosome
  • position
  • reference allele
  • alternative allele
  • reference genome

So to get the corresponding rs id, one solution is to extract the informations of the id and convert it to a vcf file. You can then annotate the ID column. There was a similar thread some time ago.

Let's start to create a vcf file. I assume you have a file id.txt which contain your id's in every line:

awk -F_ -v OFS="\t" '{print $1,$2,".",$3,$4,".",".","."}' id.txt|sort -k1,1V -k2,2g > ids.vcf

Now you can take this vcf file and annotate it with e.g. SnpSift:

java -jar SnpSift.jar annotate -id dbSNP.vcf.gz  ids.vcf > ids_annotated.vcf

Of course first you have to download the dbSnp file.

fin swimmer

ADD COMMENT
1
Entering edit mode

dbSNP release must be based on b37/GRCh37 build.

ADD REPLY
0
Entering edit mode

Vielen Dank finswimmer !! ;)

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6