Hello Javad,
without a little work you will not reach your goal. The most difficult part here is, to clarify who said what is an exon? UCSC, NCBI, Ensembl,...? And do you want just coding exons or all?
Let's assume you like all exons defined by NCBI.
- Go to UCSC Table browser
- Choose hg19 in the assembly field, "Genes and Gene Predictions" in group and "NCBI RefSeq" in track.
- Choose Bed as the output format, and give the output file a name e.g. exons.bed
- Click on get output
In the next dialog choose "Exons".
Now you have a file called exons.bed
which contain the coordinates.
What we have to do now, is to sort this file by position and remove the "chr" from the chromome names. You can do it like this:
cut -c4- exons.bed|sort -k1,1V -k2,2g -k3,3g > exons_sorted.bed
This file we can use to query the 1000 Genomes file directly on the ftp server using tabix
:
tabix -R exons_sorted.bed ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > exon_variants.vcf
If this is to slow you have to download the compressed vcf file and the tabix index file to your pc and adopt the tabix command.
fin swimmer
The answer is probably bedtools
What is your next step, why do you want subset of 1000 genome data? At the moment "I want this, and don't want to code" seems to me an unclear request.
Refer to dbSNP. In dbSNP, kgvalidated and kgprod tags denote the variants are from 1000 genomes project. Then filter by syn, nsf, nsm, nsn , u3 and u5 tags. These tags are for coding variants with calculated variant effect. For filtering you can use bcftools.
otherway is to intersect dbsnp vcf with exon coordinates.
Javad : Don't forget to follow up on this thread.
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer as long as it works.

Yeah, But I didn't want to write scripts. I thought maybe this data is already stored somewhere. Thank you anyway.
Filtering by tags is one line code if one knows how to use bcftools.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.It's just a single command but okay.
I don't think it would be just a single command. because the coordinates of exons are not included in the vcf file. Am I missing some thing? Could you please give me some hints to go through it? Thanks
You would need a bed file of the targets of interest, essentially the exome. You can get those from UCSC.
no vcf file will have exon coordinates, in general. VCF fill have coordinates for variants only. When you filter for variants in coding and UT regions, this automatically covers exonic regions, mostly.