Why Are These Snps On The Wrong Strand Compared To The Reference Genome
2
2
Entering edit mode
10.8 years ago

Hi--

I've got some gwas data I'd like to impute, but for that to happen, I need every snp to be aligned to the forward strand of the reference genome. This is not as simple as it sounds, due to many snps being ambiguous (A/T or C/G) combos.

Therefore I've tried looking at both strand data for the chips, and also the snp manifests, comparing them to the snps that are flipped, but I cannot see any pattern. What I'm looking for is a pattern in these files which explains why the snps on the first list is incorrect compared to the reference genome. If you see anything or would need more info please do ask.

Ps. the data might be botched by the researchers who used these data originally (they've moved on long since.)

Here is the head of a list of snps that are flipped compared to the reference (name, chr, position, a1, a2, reference nucleotide):

rs1774963       1 21703207 C T G
rs2257576       1 83736947 T C A
rs315041        1 77055775 A G C
rs3094315       1 752565 C T G
rs3737728       1 1021414 T C A
rs11721 1 1152630 T G C
rs2887286       1 1156130 G A T
rs3813199       1 1158276 T C G
rs3766186       1 1162434 T G C

Here are the corresponding entries from the strand file (http://www.well.ox.ac.uk/~wrayner/strand/):

rs1774963       1       21703208        99.1735537190083        +       AG
rs2257576       1       83736948        100     +       AG
rs315041        1       77055776        99.1735537190083        -       AG
rs3094315       1       752566  99.1735537190083        +       AG
rs3737728       1       1021415 100     +       AG
rs11721 1       1152631 99.1735537190083        +       AC
rs2887286       1       1156131 100     -       AG
rs3813199       1       1158277 99.1735537190083        +       AG
rs3766186       1       1162435 99.1735537190083        +       AC

Here are the corresponding entries from the snp table/manifest:

Name    SNP     ILMN Strand     Customer Strand
rs1774963       [A/G]   TOP     BOT
rs2257576       [A/G]   TOP     BOT
rs315041        [T/C]   BOT     TOP
rs3094315       [T/C]   BOT     TOP
rs3737728       [A/G]   TOP     BOT
rs11721 [A/C]   TOP     BOT
rs2887286       [T/C]   BOT     TOP
rs3813199       [A/G]   TOP     BOT
rs3766186       [A/C]   TOP     BOT

What is the rule that explains why the snps on the first lists are opposite of the reference genome? Or might these data be non-sensical?

snp gwas strand • 5.0k views
ADD COMMENT
0
Entering edit mode
8.5 years ago
nadne ▴ 40

Did anyone resolved that?

ADD COMMENT
0
Entering edit mode

I'd have thought that the variants' alleles in the first list were reported on the reverse strand e.g. rs3737728 from dbSNP but on the forward strand elsewhere e.g. Ensembl. I've not checked all of the variants above but for the ones I did, this seems to be the case. Check this FAQ.

ADD REPLY
0
Entering edit mode
8.5 years ago
nadne ▴ 40

Check out this resource for updating strands, and A/B mapping.

ADD COMMENT

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6