I want to collect missense SNPs, nonsense SNPs, SNPs in the 5' UTR and SNPs in the 3'UTR region for a particular gene in humans. How do I do that?
I want to collect missense SNPs, nonsense SNPs, SNPs in the 5' UTR and SNPs in the 3'UTR region for a particular gene in humans. How do I do that?
If you just have a handful of genes you can look them up in Ensembl, which hosts dbSNP data.
Go to Ensembl and search for your gene, e.g. STAR. Click on the results to go to the gene tab and find the Variant table in the left-hand navigation panel. You can use the filter options above the table to find the data you're interested in - click the Consequences button, turn all off and choose missense, synonymous (aka nonsense), 5 prime UTR variant and 3 prime UTR variant. Click OK to close the box and then click on Filter other columns > Source to choose dbSNP variants only.
If you have many genes you can use BioMart:
Step 1: Choose Ensembl Genes as your database, then choose human as the dataset.
Step 2: Choose filters, click on Filters in the left hand navigation panel. Expand the GENE section and for the section 'Input external references' upload a file, or paste your gene names/ids (make sure they match the format in the drop down box above the input form, e.g. if you have BRCA2 you need to choose Gene name from the drop down box). Then expand the VARIANT section and choose the variant source to be dbSNP, and choose your variant consequence (e.g. missense) terms, you can choose multiple by ctrl+/cmd+ click.
Step 3: Choose the attributes you want, click on Attributes in the left hand navigation panel. Choose Variant (germline) from the top of the page. Expand GENE to add gene name, or other values. Expand the GERMLINE VARIANT INFORMATION section to choose additional info about the variants.
Step 4: Get results by clicking on the Results button above the left-hand navigation panel.
Here's an example.
using mysql/ucsc
$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select * from snp150 where chrom="chr1" and chromStart> 921694 and chromEnd < 921720 and func in ("untranslated-3","untranslated-5","nonsense","missense") '
+-----+-------+------------+----------+--------------+-------+--------+---------+---------+----------+---------+--------+---------+-------+---------+----------------+---------+--------+------------+----------------+------------------+-----------------+---------+----------+-------------+-----------+
| bin | chrom | chromStart | chromEnd | name | score | strand | refNCBI | refUCSC | observed | molType | class | valid | avHet | avHetSE | func | locType | weight | exceptions | submitterCount | submitters | alleleFreqCount | alleles | alleleNs | alleleFreqs | bitfields |
+-----+-------+------------+----------+--------------+-------+--------+---------+---------+----------+---------+--------+---------+-------+---------+----------------+---------+--------+------------+----------------+------------------+-----------------+---------+----------+-------------+-----------+
| 592 | chr1 | 921695 | 921696 | rs1045705904 | 0 | + | G | G | C/G | genomic | single | unknown | 0 | 0 | untranslated-3 | exact | 1 | | 1 | HUMAN_LONGEVITY, | 0 | | | | |
| 592 | chr1 | 921696 | 921697 | rs897840143 | 0 | + | T | T | C/T | genomic | single | unknown | 0 | 0 | untranslated-3 | exact | 1 | | 1 | HUMAN_LONGEVITY, | 0 | | | | |
| 592 | chr1 | 921718 | 921719 | rs977712348 | 0 | + | G | G | A/G | genomic | single | unknown | 0 | 0 | untranslated-3 | exact | 1 | | 1 | TOPMED, | 0 | | | | |
+-----+-------+------------+----------+--------------+-------+--------+---------+---------+----------+---------+--------+---------+-------+---------+----------------+---------+--------+------------+----------------+------------------+-----------------+---------+----------+-------------+-----------+
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can download the database as a vcf file and then filter that for your region of interest.