Hi , I was wondering if there is any code or software inside ENSEMBL that can help me to get all SNPs that are annotated to a specific gene ( from the 1000 genome project) for example : getting a VCF file of all SNPs for the gene :LINC01435.
Thank you for help
bk11 Thank you and i was wondering if there is a list of genes I'm interested in , is also possible to extract them using tabix like you did?
Absolutely, you can do it pretty easily. All you need is to provide the loci of genes (CHROM:START-END). I am demonstrating using a simple for loop here-
bk11 thank you vey much , I was also wondering when I download the VCF file for all SNPs in a specific gene and read it in R ( using the readVCF package) a lot of SNPs ( some which i need) are filtered out for example :
and i get the warnings:
how can i prevent it from happening ? and why is it occurring ?
I am not sure what are trying to do here. But, I do not have any problem in reading the vcf file generated for a specific loci using
tabix
.bk11 so what i wanted to do is to read the VCF file in R and then convert it to data frame for easier usage : for example :
but after reading it I see that vcfR filtered many of the variants out :
so originally there where 9315 but after filtering only 400 and i was wondering why it is filtering it out and how to prevent it
It is because you are filtering while reading vcf using criterin
min_maf = 0.02
. Without filter you will see the number as you saw before.bk11 HI , I was wondering from where did you get the position of the gene , in Gwas it is :LINC01435 10:107694973-108197849 but here in the code you wrote :chr10:109517591-109871360. I'm simulating data with the sim1000G package and it seems that there is a error of mismatch between chromosomes in genetic map and vcf , the package uses GRCh37 coordinates and Im suspecting maybe Im using not the correct database