1000Genomes Mapping Onto Protein Sequences?
3
Hi,
does the 1000genomes project provide any mapping onto protein sequence? Couldn't find any info on their website or data files. Could that information be retrieved somewhere else? I know that dbSNP does this, but data from 1kG may not be included entirely in dbSNP.
Thanks
Chris
Edit: To make this clearer: I'm interested in nsSNPs.
genome
protein
mapping
snp
• 3.2k views
•
link
updated 13.1 years ago by
Laura
★
1.8k
•
written 13.1 years ago by
Chris
★
1.6k
You can download the VCF files for 1000G from the latest release here . Then, it is pretty straightforward to run snpEff, annovar, or VariantAnnotation (Bioconductor) to get mappings to transcripts and to see the effects variants have on proteins.
You could try to use the the ensembl effect predictor or snpeff to map those data.
Note: I'm currently writing a set of C++ tools doing this kind of task on the fly:
$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.sites.vcf.gz " |\
gunzip -c |\
grep -v "##" | normalizechrom -c 1|\
prediction -f hg19.fa |\
egrep '(EXON|#CHROM)' | head | verticalize | cut -c 1-90
>>> 2
$1 #CHROM chr1
$2 POS 69511
$3 ID rs75062661
$4 REF A
$5 ALT G
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=607;AF=0.789;CB=UM,BI;EUR_R2=0.054;AFR_R2=0.247
$9 knownGene.name uc001aal.1
$10 knownGene.strand +
$11 knownGene.txStart 69090
$12 knownGene.txEnd 70008
$13 knownGene.cdsStart 69090
$14 knownGene.cdsEnd 70008
$15 prediction.type EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna 420
$17 prediction.pos_in_protein 141
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon ACA
$21 prediction.mut.codon GCA
$22 prediction.wild.aa T
$23 prediction.mut.aa A
$24 prediction.wild.prot MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$25 prediction.mut.prot MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLS
$26 prediction.wild.rna ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$27 prediction.mut.rna ATGGTGACTGAATTCATTTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTA
$28 prediction.splicing .
<<< 2
>>> 3
$1 #CHROM chr1
$2 POS 324822
$3 ID .
$4 REF A
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9 knownGene.name uc009vjk.2
$10 knownGene.strand +
$11 knownGene.txStart 322036
$12 knownGene.txEnd 326938
$13 knownGene.cdsStart 324342
$14 knownGene.cdsEnd 325605
$15 prediction.type EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna 386
$17 prediction.pos_in_protein 129
$18 prediction.exon Exon 3
$19 prediction.intron .
$20 prediction.wild.codon GCA
$21 prediction.mut.codon GCT
$22 prediction.wild.aa A
$23 prediction.mut.aa A
$24 prediction.wild.prot MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing .
<<< 3
>>> 4
$1 #CHROM chr1
$2 POS 324822
$3 ID .
$4 REF A
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=1649;AF=0.005;CB=UM,BI;EUR_R2=0.141;AFR_R2=0.017
$9 knownGene.name uc001aau.2
$10 knownGene.strand +
$11 knownGene.txStart 323891
$12 knownGene.txEnd 328580
$13 knownGene.cdsStart 324342
$14 knownGene.cdsEnd 325605
$15 prediction.type EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna 386
$17 prediction.pos_in_protein 129
$18 prediction.exon Exon 3
$19 prediction.intron .
$20 prediction.wild.codon GCA
$21 prediction.mut.codon GCT
$22 prediction.wild.aa A
$23 prediction.mut.aa A
$24 prediction.wild.prot MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$25 prediction.mut.prot MLLPPGSLSRPRTFSSQPLQTKLMTHNGLFRPIPYVTAASADEATASQQPPQAQLHRYNG
$26 prediction.wild.rna ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$27 prediction.mut.rna ATGCTCCTACCTCCCGGCAGCCTCTCCAGGCCCAGAACTTTCTCCAGTCAGCCTCTACAG
$28 prediction.splicing .
<<< 4
>>> 5
$1 #CHROM chr1
$2 POS 762085
$3 ID .
$4 REF G
$5 ALT A
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=428;AF=0.028;CB=BC,NCBI
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_STOP_GAINED
$16 prediction.pos_in_cdna 486
$17 prediction.pos_in_protein 163
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon CAG
$21 prediction.mut.codon TAG
$22 prediction.wild.aa Q
$23 prediction.mut.aa *
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 5
>>> 6
$1 #CHROM chr1
$2 POS 762109
$3 ID .
$4 REF C
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=2991;AF=0.009;CB=UM,BI,BC,NCBI;EUR_R2=0.652;AFR_R2=0.744
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna 462
$17 prediction.pos_in_protein 155
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon GTG
$21 prediction.mut.codon ATG
$22 prediction.wild.aa V
$23 prediction.mut.aa M
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 6
>>> 7
$1 #CHROM chr1
$2 POS 762187
$3 ID .
$4 REF C
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=2788;AF=0.007;CB=UM,BI,BC,NCBI;AFR_R2=0.906
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna 384
$17 prediction.pos_in_protein 129
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon GAG
$21 prediction.mut.codon AAG
$22 prediction.wild.aa E
$23 prediction.mut.aa K
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 7
>>> 8
$1 #CHROM chr1
$2 POS 762273
$3 ID rs3115849
$4 REF G
$5 ALT A
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=1202;AF=0.555;CB=UM,BI,BC;EUR_R2=0.636;AFR_R2=0.629
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna 298
$17 prediction.pos_in_protein 100
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon CCT
$21 prediction.mut.codon CTT
$22 prediction.wild.aa P
$23 prediction.mut.aa L
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 8
>>> 9
$1 #CHROM chr1
$2 POS 762320
$3 ID rs75333668
$4 REF C
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=2030;AF=0.048;CB=UM,BI,BC,NCBI;EUR_R2=0.529;AFR_R2=0.709
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_CODING_SYNONYMOUS
$16 prediction.pos_in_cdna 251
$17 prediction.pos_in_protein 84
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon GTG
$21 prediction.mut.codon GTA
$22 prediction.wild.aa V
$23 prediction.mut.aa V
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 9
>>> 10
$1 #CHROM chr1
$2 POS 762330
$3 ID rs74045217
$4 REF G
$5 ALT T
$6 QUAL .
$7 FILTER PASS
$8 INFO DP=2132;AF=0.038;CB=UM,BC
$9 knownGene.name uc010nxx.1
$10 knownGene.strand -
$11 knownGene.txStart 761586
$12 knownGene.txEnd 762902
$13 knownGene.cdsStart 762079
$14 knownGene.cdsEnd 762571
$15 prediction.type EXON|EXON_CODING_NON_SYNONYMOUS
$16 prediction.pos_in_cdna 241
$17 prediction.pos_in_protein 81
$18 prediction.exon Exon 1
$19 prediction.intron .
$20 prediction.wild.codon CCA
$21 prediction.mut.codon CAA
$22 prediction.wild.aa P
$23 prediction.mut.aa Q
$24 prediction.wild.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$25 prediction.mut.prot MWLVFHRPHPRPSWPLRAALGFGRRQSSLRCFPVLPSARPYVSANPTLRGGRLRQDPESE
$26 prediction.wild.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$27 prediction.mut.rna ATGTGGCTTGTCTTCCATCGTCCCCACCCTCGCCCCTCTTGGCCCCTCAGGGCAGCCCTG
$28 prediction.splicing .
<<< 10
Login before adding your answer.
Traffic: 1871 users visited in the last hour
It seems that you are interested in mapping variants discovered by 1000G onto proteins. If this is true, than please add a "SNP" or "genetic-variation" tag to your question. This is a good question and should have the best tags possible.
Larry: Good point. Done.