Convert between formats for names of indels
1
1
Entering edit mode
10.2 years ago
lillo.sim ▴ 50

Hi,

I have a list of INDELS in their rsid format, and I am trying to convert this list from the rsids to a format like the one coming out of the imputation from MACH/Minimac, i.e. chr:pos:ALLELES.

I have tried using biomaRt to find the chr, positions, and alleles corresponding to the rsids like this:

library(biomaRt)
snpmart = useMart("snp", dataset="hsapiens_snp")
getBM(c("refsnp_id","allele","chr_name","chrom_start"), values="rs200623867",   filters="snp_filter", mart=snpmart)

But I get this for an INSERTION:

  refsnp_id   allele chr_name chrom_start
rs146107628      -/T       10   100002842

That I would like to convert to this format:

10:100002841:C_CT    I    R

While for a DELETION:

rs200623867    G/-       10   100003302

That I would like to convert to this:

10:100003301:AG_A    D    R

So it looks like I am missing the information about the other allele when using biomaRt.

Is there maybe a better approach to completing this convertion in R?

Thank you!

Simone

indels R biomart • 2.8k views
ADD COMMENT
1
Entering edit mode
10.2 years ago
Emily 24k

You could get 1 base upstream sequence with biomaRt

ADD COMMENT
0
Entering edit mode

Thank you for your reply! But even if I go one base before I am still having trouble with finding the "other" allele using biomaRt, so for example how would I know to convert

rs146107628      -/T       10   100002842 --> 10:100002841:C_CT    I    R

If I don't know that the other allele is C, but I only get T from biomaRt?

ADD REPLY
0
Entering edit mode

Well, if you get one base upstream, that base would be C, so you would say -/T with one base upstream was C_CT.

ADD REPLY
0
Entering edit mode

Thanks Emily, this seems very clever. Do you know if there is a way using biomaRt to request the upstream position based on rsids in one go, or do you think I first need to find the chr:pos for each of the variants listed in the rsids object, and then in a second step find the alleles at their chr:pos-1?

I am just trying to understand the quickest way to convert these IDs inside an R script, maybe biomaRt isn't the best way to go if it requires many steps, but I don't know of another package that could help with this...

Thanks again for your advice!

ADD REPLY
0
Entering edit mode

In the variant mart, there's a section called Sequences where you can specify the upstream sequences plus the alleles.

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6