I have numbers of SNPs obtained from a paper where authors specified old and novel detected SNPs (index SNPs) with the position, gene, MAF, risk and normal allele and nothing more (GWAS).
I am analyzing a linkage disequilibrium want to find a set of credible SNPs using the index SNPs mentioned in the paper. After I identify the set (which can also include index SNPs) I want to annotate them using VEP but I do not have a strand information for index SNPs. Is there any command-line approach that I can use to assign each index SNP a strand information based on position, MAF, alleles?
(I would not like to use ANNOVAR as I further want to do a custom annotation with BED file. And annovar output is very weak, missing the peak information)
I guess I was not clear enough. Authors of the paper where I took the SNPs from, provided only following information: SNP, gene, position, maf, risk allele, normal allele and nothing more. I have only this information.
Can't think of a way to get accurate strand information without the reads. I think you could likely make an educated guess to separate homozygous mutations from het with MAF but that's about it.
I assume that ANNOVAR somehow by looking at the database finds if the SNP is on sense or antisense strand as it assigns the right AS mutations for the corresponding variation. VEP, for instance, needs strand information and therefore I get different results for AS mutation if I assign the same SNP different strand information. But one of the results in VEP will be consistent with the result of ANNOVAR.
If I'm understanding correctly, this could be simple as checking the reference base at the position and comparing to see if your allele is represented with respect to positive strand. You could do this with samtools, if your base is reverse complimented, you could assign the opposite strand. I think this is what annovar does.