Question

Finding the Genes corresponding to given SNPs using python

0

Entering edit mode

7.9 years ago

mms140130 ▴ 60

Hi,

my Question is how to find the genes related to given SNPs the part of Snps data I have as follows:

> rs987435        C       G       1       1       1       0       2
> rs345783        C       G       0       0       1       0       0
> rs955894        G       T       1       1       2       2       1
> rs6088791       A       G       1       2       0       0       1
> rs11180435      C       T       1       0       1       1       1
> rs17571465      A       T       1       2       2       2       2
> rs17011450      C       T       2       2       2       2       2
> rs6919430       A       C       2       1       2       2       2
> rs2342723       C       T       0       2       0       0       0
> rs11992567      C       T       2       2       2       2       2

While the data for the gene annotation was downloaded from UCSC website

> DDX11L1   NR_046018   chr1    +   11873   14409   14409   14409
> WASH7P    NR_024540   chr1    -   14361   29370   29370   29370
> LINC01204 NR_104644   chrX    +   45364632    45386484    45386484    45386484
> LOC392232 NR_033867   chr8    -   73114986    73163869    73163869    73163869
> FBXL22    NM_203373   chr15   +   63889551    63894620    63889591    63893885
> LOC729737 NR_039983   chr1    -   134772  140566  140566  140566
> LOC100132287  NR_028322   chr1    +   323891  328581  328581  328581
> LOC100132062  NR_028325   chr1    +   323891  328581  328581  328581

> > 
>     > where the columns names are as follows table refFlat "A gene
>     > prediction with additional geneName field."
>     >     (
>     >     string  geneName;           "Name of gene as it appears in Genome Browser."
>     >     string  name;               "Name of gene"
>     >     string  chrom;              "Chromosome name"
>     >     char[1] strand;             "+ or - for strand"
>     >     uint    txStart;            "Transcription start position"
>     >     uint    txEnd;              "Transcription end position"
>     >     uint    cdsStart;           "Coding region start"
>     >     uint    cdsEnd;             "Coding region end"
>     >     uint    exonCount;          "Number of exons"
>     >     uint[exonCount] exonStarts; "Exon start positions"
>     >     uint[exonCount] exonEnds;   "Exon end positions"
>     >     )

how can I use this data to find the gene related to which SNPs

Thank you

SNP gene genome • 3.6k views

ADD COMMENT • link updated 7.9 years ago by mforde84 ★ 1.4k • written 7.9 years ago by mms140130 ▴ 60

score 3 · Accepted Answer · 2017-01-15

3

Entering edit mode

7.9 years ago

mforde84 ★ 1.4k

There are a lot of good annotation tools out there that you an use including SnpEff, VariantEffectPredictor, ANNOVAR, Oncotator, SNP-nexus, etc.

ADD COMMENT • link 7.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

I don't understand what do you mean

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

0

Entering edit mode

If you want to lookup the annotation information for a given dbSNP id, then you need to interface with the dbSNP database directly and do a lookup. Or you can use a variety of web-interface tools available for this purpose, e.g., SNP-nexus takes dbSNP ids as input.

If you need to lookup annotation information based on genomic location, for example like the data you'll see in a standard vcf file, then you can use any of the tools above which work directly with genomic coordinate variant calls. alternatively, you can download the annotations (gtf, gff) corresponding to your sequence assembly then recursively examine your data with the annotation features.

ADD REPLY • link 7.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Not exactly what OP was asking for (annotation using Python), but still the best/right answer for his issue :-)

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

1

Entering edit mode

HOWTO: Annotations using assembly :).

But honestly, you'd be surprised how much I see bioinformaticians using python and all they're are doing is making a bunch of system calls. It's both hilarious, and disturbing. I get why someone would want to do this, as I occasionally do it in Rscripts, but still.

ADD REPLY • link 7.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

The thing is my data is really large almost 9 million snps that is why I asked about python ... I'm new to python if any can help with that I do appreciate that.. I have a file with rs for the snps

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

1

Entering edit mode

That's quite a lot indeed. Then I think the easiest path is to:

Download the dbSNP databank in vcf format
Filter the vcf using your list of rs IDs (using grep -f)
Use SnpEff or similar for annotation of the SNPs

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

thank you, can you explain how to start SNPEff Actually I was searching the whole day . I got tired so would you please give me the steps Appreciate your help

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

0

Entering edit mode

I think the manual is very clear, what did you try and what doesn't work? Try to be specific when asking questions.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I tried SCAN http://www.scandb.org/newinterface/index_v1.html But my data is really large that is why it doesn't work .. same for snp nexus didn't work .

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

0

Entering edit mode

I listed the steps here: C: Finding the Genes corresponding to given SNPs using python
What's wrong? Where are you stuck?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I used the following:

curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp147.txt.gz" | gunzip > datasnp

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

0

Entering edit mode

Can you pleas tell me where I can find the dbSNP databank in VCF format, or I can use the txt format as i mentioned above

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

0

Entering edit mode

now I have the dbSnp data base in txt format and I also have my dbSNp in a txt file called pre_snpinfo_tumor ( 4th column).

what is the code to filer the snp database ?

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60

1

Entering edit mode

You would need a file containing just the dbSNP identifiers, you can isolate this one using awk or cut. This file would be DesiredIdentifiers.txt. Then you can use a simple grep, such as

grep -f DesiredIdentifiers.txt dbSNPfile.vcf

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I'm really confused could you please tell me where to find the VCF file for the snp database as what I have downloaded from the UCSC website doesn't contain the gene names

ADD REPLY • link 7.9 years ago by mms140130 ▴ 60