Hello,
I have extracted some genome wide association information from consortium results for a list of SNPs i'm interested in. The results are based on positive strand of build GRCh36. I have no problems updating rs numbers, and position using a combination of biomaRT and liftOver. However i am unsure on how to assign alleles. Now i know that biomaRT allele information is all on the positive strand. This is an easy task for un-ambiguous SNPs (e.g C/T, A/G). For ambiguous SNPs i.e (C/G and A/T) I can assign alleles based of matching allele frequencies, but this might become more difficult for SNPs with minor allele frequencies close to 0.5. More so since biomaRT gives you overall MAF across all populations while the consortium MAFs are for Europeans only.
The ideal solution would be to know whether the strands have changed between builds for my list of SNPs (e.g build 36 to 37 and/or build 37 to 38). Can such information be found? Am i missing an obvious solution to this problem?
Basically what I'm looking for is a method to track SNP orientation changes between builds. It would seem that this is something that should be available. Like an orientation key relative previous build.
While converting between coordinates if the input snp bed file has strand information (6th column) then the output also should have have a strand information so any strand changes could be determined. In all using a 6 column bed file of input snps for lifting between assemblies should solve this problem. I use CrossMap and made a dummy file bed file and got following output
Chr1 1 2 a 10 + -> Chr1 1 2 a 10 +
Chr2 20 21 b 1000 + -> Chr2 20 21 b 1000 +
Chr3 22 23 c 1000 + -> Chr3 22 23 c 1000 +
Will install this and give it a try. Just to be clear, in my case i have the position but none of the other BED column information (ie column 4 'name', 5 'score', 6 'strand'). But your dummy example would suggest that i can fill in dummy information into these columns and still get the correct information for the destination build in the CrossMap output?
Yes I would say so as you mentioned that the reported SNPs are from plus strand so this information can be added to the 6th column and 4th and 5th filled with dummy data. I think webliftover can also do this provided a 6 column bed is provided. However I have not done that for human data but I was successful in assigning SNPs across builds with Crossmap for a plant species. A good indication would be an entry in the conversion output where the strand is shown as negative.
Thanks, i tried the weblifover and the 4th,5th and 6th column seem to have no effect. Output is 3 columns only. Will work with Crossmap. Seems to have many advantages anyway but it will take me some time to get it up and running. Thanks for all your help microfuge.
Sweet this worked just fine!