I want to see the mutation in SCN5A (which is on chromosome 3:38548066-38649628) in Punjabi population and for this purpose I have downloaded the vcf file for chromosome 3 from here.
The file is very large and contain variants (on chromosome 3) for all the populations in 1000 genomes project and I do not know how to filter it based on population and region that's why I am not able to get variants for the region of interest.
How this vcf file can be filtered based on region and population ?
There are a lot of tools to subset a VCF file by region/sample - this is a routine requirement. Did you try searching the forum? GATK has a couple of tools, and so does vcftools/tabix/plinkseq.
You will need to generate the list of samples that form the population of your choice - that is, the samples being a part of a population is more of a logical grouping than something an algorithm can figure out.
But there is still a problem. I want to compare these blast results and i.e the G shown in yellow in part2 with the variants file for punjabi population from 1000 genomes project. The variant file is here. I want to check that whether at the same position this mutation exists in punjabi population or not. But I am not getting the right way to do it because the blast file has different numbers for position of bases than the vcf file. So what can be the way to compare these?
Do not add answers unless you're answering the question. This deserves to be a comment reply on an appropriate post. I'm moving it to a comment on the top level post now.
I tried to add it as a comment but it gave error that's why I used this option.
What was the exact error you faced?