On retrieving SNPs from dbSNP
0
0
Entering edit mode
16 months ago
Agenor Neto ▴ 10

Hello everyone!

I have been working on a code where I retrieve all the missense mutations from certain genes and, since I am interested in some specific amino acid positions from the resulting protein, I also filter these results based on these positions. The only way I found to do it was trough searching in the HGVS annotations that come with each SNP, like this:

HGVS=NC_000012.12:g.68689538G>A,NC_000012.11:g.69083318G>A,NG_046600.2:g.7588G>A,NM_020401.4:c.106G>A,NM_020401.3:c.106G>A,NM_020401.2:c.106G>A,NM_001330192.2:c.-10G>A,NM_001330192.1:c.-10G>A,XM_005269037.5:c.106G>A,XM_005269037.4:c.106G>A,XM_005269037.3:c.106G>A,XM_005269037.2:c.106G>A,XM_005269037.1:c.106G>A,NP_065134.1:p.Ala36Thr,XP_005269094.1:p.Ala36Thr|SEQ=[G/A]|LEN=1|GENE=NUP107:57122

I search with regex all the SNPs that causes mutations in Alanine in the 36th position, for instance (I take this position from canonical sequences from UniProt). But this searching method, despite allowing me to take all the SNPs that point to this amino acid in this position (true positives) can also bring me SNPs in my reference sequence that would cause a change in the Alanine in the 50th position and it came within the results only because there is an isoform where this Ala50 is on the 36th positions. In my research, the surrounding amino acids are really important.

I tried to think in some ways to solve this problem but NCBI does not provide annotations of some canonical sequence for each protein (which would be helpful since I would only need to put the accession ID in the regex pattern) and I really cannot do this curation because I have a lot of proteins (50+).

This is the main problem but if anyone knows if there is some tool which can help me to perform the original task I am trying to perform, I would be very happy to know. Thank you!

uniprot ncbi genetics genomics snp • 549 views
ADD COMMENT
0
Entering edit mode

You can look into MANE transcripts, they're curated transcripts that NCBI and Ensembl agree upon. https://www.ncbi.nlm.nih.gov/refseq/MANE/

You can explode your strings by commas and do an inner join to MANE transcript ids.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6