I have a list of SNPs from a small GWAS data. But I don't have their strand information. What I have is the chromosome number, posistion, and alleles. I also have their rs numbers.
In order to do imputation, I need the strand information of each of the SNPs, i.e. which strand (+/-) is the recorded SNP coming from. I found a webpage (http://www.well.ox.ac.uk/%7Ewrayner/strand/) that has strand information for some common chips, but unfortunately I don't know what chip they used to get to the results.
I could eventually download the refence genome and write a script to do that. But before that I was wondering if there is any tool already available for this purpose. Thanks!
I don't understand why you need the strand information, or even care what the base is. Why not just pretend they are all on the plus strand? It won't affect positional association.
I don't know every detail of the imputation algorithm, but I think in order to impute more variants, the imputation tool needs to know the exact haplotype of each sample so that it can compare the sample haplotype with the reference genome and calculate the probabilities of a SNP in the unsequenced region. I guess that's why both imputation tools IMPUTE2 and SHAPEIT ask for strand information as input. I could be wrong. Do you have any idea? Thanks.
Downloading ref genome .fasta and checking ref strand will certainly work, and can be done quickly. C/G or A/T SNPs may be ambiguous. Checking your alleles against dbSNP may be an option (they maintain a VCF), I'm not aware of a batch lookup tool that would do that for you without at least some scripting, would be interested to know of one. Also see the thread: assign each SNP a strand information for a samtools example of ref base lookup.