Hi!
I am wondering if someone can help me to get a code to call the sumary info from ncbi or genecards, I asume maybe using wget???
For sample I want from a list of gens get in a doc (so I can print it out) something like this:
TAP2
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance. This gene is located 7 kb telomeric to gene family member ABCB2. The protein encoded by this gene is involved in antigen presentation. This protein forms a heterodimer with ABCB2 in order to transport peptides from the cytoplasm to the endoplasmic reticulum. Mutations in this gene may be associated with ankylosing spondylitis, insulin-dependent diabetes mellitus, and celiac disease. Alternative splicing of this gene produces products which differ in peptide selectivity and level of restoration of surface expression of MHC class I molecules. [provided by RefSeq, Feb 2014]
IDUA
Summary This gene encodes an enzyme that hydrolyzes the terminal alpha-L-iduronic acid residues of two glycosaminoglycans, dermatan sulfate and heparan sulfate. This hydrolysis is required for the lysosomal degradation of these glycosaminoglycans. Mutations in this gene that result in enzymatic deficiency lead to the autosomal recessive disease mucopolysaccharidosis type I (MPS I). [provided by RefSeq, Jul 2008]
thanks!
Thank you so much!!! I will try it tomorrow and see :)
well I tried yesterday with this code..Im not a programer ... I generate the py file you gave me, and from internet I added modules Bio.py, Bio.Expasy.py and Bip._py3k...Hope I did right
I get this error message, what does mean?
python gene_sumary.py 8.vcf Traceback (most recent call last): File "gen.py", line 4, in <module> from Bio import Entrez File "/home/cri/Desktop/GET GEN/Bio.py", line 101 from Bio._py3k import urlopen as _urlopen ^ IndentationError: unexpected indent
Could you post the changes you made to the script? The error seems easy to solve, just a tab which is incorrect. It even tells you on which line...
mmm I think I did smething wrong..so I resintalled phyton and run your script, now instead all this error messages I get:
I hadnt modify anything
As a clarification, it's not my script, I just found it while googling. It looks a bit strange. Do you have your input in a list in a file? How would you like to use the script? I'll rewrite it.
Which python version do you use?
oh thanks if you dont mind...Im not a programmer so for me this is pretty hard.
I thought I could just call the gens from my vcf file, I have tried just now from a txt file just with two genes and see..get same error.
I have PHYTON- 1.67
I wrote it to just read from a file with a gene name on each line. I don't think writing this for a vcf file is a lot of fun, unless you really want to... I guess this should work. Execute script as
python getGeneSummary.py yourlist.txt
Let me know if it doesn't work as it should or you would like a modification.ohhh
THANK YOU SOOOOOOOO MUUUUCH!!!!!!!!!! :))))))
I'm wondering if isn't any better alternative than ncbi? Even at genecards I find more information, many of the genes appear like this O_O
SLC6A5 !! No summary found SLC16A2 !! No summary found HRNR !! No summary found ......
Thanks
http://www.ncbi.nlm.nih.gov/gene/?term=SLC6A5%5Bsym%5D
http://www.ncbi.nlm.nih.gov/gene/?term=SLC16A2%5Bsym%5D if you need the human gene.
http://www.ncbi.nlm.nih.gov/gene/?term=HRNR%5Bsym%5D
mmm then why I get that error when I use the code he gave me?
appear just a few from my list...
Genomax, would you mind to see if you get same error please?
For some genes (e.g. http://www.ncbi.nlm.nih.gov/gene/388697) there simply is no summary available.
As you can see in my code, the summary information is parsed from what it gets from ncbi. If something isn't properly formatted, it might give an error. I don't know if genecards has an API which I could access to pull the summary out. Although I must say that for the HRNR example genecards isn't very informative as well! (http://www.genecards.org/cgi-bin/carddisp.pl?gene=HRNR)
I don't know yet why SLC genes don't work, and I will look into this. Tonight, or after the conference this week. Sorry for the inconvenience, I hope I'll find a way around this.
You dont need to be sorry at all my god, Im really happy and appreciate your help, maybe I have st wrong with my pc and dont go as it should be?
Thanks you so much!, just for when you have the time ^^
Don't worry :p I'm happy to help and improve my scripting at the same time. Thanks for the feedback and challenge!
thank you!
hahahah if you want I can give you more challeneges XD
Hi Christina, I made a first refinement and it took me just a few minutes (actually disappointingly easy). It's beyond me why this is the case, but without specifying the species apparently this gene was the top of the list with the entrez query: http://www.ncbi.nlm.nih.gov/gene/108519407
I limited to Homo sapiens (assuming that is what you're interested in) and it seems to work now.
If you have examples which don't work as they should, please let me know!
Code can be found below:
THANKS A LOT!
Works like a charm now!! These time some of gens like CPEB1 don't appear just because isn't a summary on ncbi. Good work!!
..but...always is a but with me...XD the file stops bit later and I get this error message...can be because my text file has over 1000 genes???? ^^"
Traceback (most recent call last): File "/home/cri/Desktop/biopython-1.68/getGeneSummary.py", line 15, in <module> result = Entrez.read(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/__init__.py", line 450, in read record = handler.read(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/Parser.py", line 233, in read self.parser.ParseFile(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/Parser.py", line 390, in endElementHandler raise RuntimeError(value) RuntimeError: Invalid db name specified: gene
and THANK YOU SOOOOO MUCH!!!!!