Question

Qpcr: How And What Softwares You Used To Get All Transcripts Of One Gene?

0

Entering edit mode

14.2 years ago

Cheng Zhongshan ▴ 400

Dear all, now I want to design primers for all transcripts of one gene, such as CD55 (GeneId:1604), unfortunately, I can not use perl to parse all exon and intron positions about all of the transcripts from NCBI in a simple way. I really want to know what softwares and how your guys to extract all the transcripts of the gene CD55 in a simple perl code, or anyother programming codes, such as python, R? Thanks a lot!

perl transcript • 3.6k views

ADD COMMENT • link 14.2 years ago by Cheng Zhongshan ▴ 400

Ram · Answer 1 · 2011-02-25

I wouldn't use Genbank to retrieve the positions of the exons as some records can be poorly annotated.

You can use the mysql database of the UCSC to get the genomic positions of the exons and then retrieve the DNA sequences using fastacmd or UCSC/DAS-DNA

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -P 3306   -e 'select exonStarts,exonEnds from knownGene as K,kgXref as X where geneSymbol="CD55" and X.kgId=K.name'
    +----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
    | exonStarts                                                                                                     | exonEnds                                                                                                       |
    +----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207513735,207532890, | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207513853,207534309, | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207532890,           | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207534309,           | 
    | 207494816,207495726,207498966,207500096,207504452,207510037,207510673,207512741,207532890,                     | 207495210,207495912,207499066,207500182,207504641,207510163,207510754,207512762,207534309,                     | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207513735,           | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207514081,           | 
    | 207494816,207495726,207497903,207498966,207500096,207504452,207510037,207510673,207512741,207527351,207532890, | 207495210,207495912,207498095,207499066,207500182,207504641,207510163,207510754,207512762,207527444,207534309, | 
    +----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+

score 2 · Answer 2 · 2011-02-25

2

Entering edit mode

14.2 years ago

Ryan W. ▴ 120

Funny, I was just working on something related to this today. I use the NCBI E-utilities (I know, awesome name, right?). Specifically, I was requesting FASTA formatted mrna transcripts in my Java application by using their "EFetch" utility at the following URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=YOUR_SEQUENCE_ID_HERE&rettype=fasta&retmode=text

The great thing about this is that it's language agnostic since you're just issuing HTTP requests and parsing responses. Once I make a request and get the data back, I just parse the sequence out of the FAST-formatted response and I'm off to the races.

I'm not exactly sure how to query a list of all transcripts for a given gene but I imagine you can use their "ESearch" utility to get that data. I just download their one big gene2accession file (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz) and that tells me all the transcripts available for all genes.

Documentation for NCBI E-Utilities: http://www.ncbi.nlm.nih.gov/books/NBK25497/

ADD COMMENT • link 14.2 years ago by Ryan W. ▴ 120

0

Entering edit mode

The ID cheng used is not a GI but a GeneId. You cannot use ncbi-efetch to retrieve the sequence. Moreover the records in genbank can be poorly annotated and won't always contain the positions of the exons.

ADD REPLY • link 14.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Yeah, that's why I said to use the ESearch utility. You can use the GeneId as a parameter in a query to get a list of transcripts. As for the the quality of the annotations, I'll defer to you.

ADD REPLY • link 14.2 years ago by Ryan W. ▴ 120

0

Entering edit mode

Yeah, the perl module Mutipride actually uses ESearch to fetch the DNA sequence of one gene by GeneID and extract information, such as exon and intron postions in the genbank file. Unfortunately, only some of the exon and intron positions are pointed out in the genbank file. That's why I want to change the codes of Mutipride and get all the transcripts of one gene first and design qPCR primers in a pipeline.

ADD REPLY • link 14.2 years ago by Cheng Zhongshan ▴ 400

score 0 · Answer 3 · 2011-02-25

Thanks! you suggestions are impressive, actually, I want to design qPCR primers to differing all the transcripts, i.e, I need to know all the position of exon and intron for all the transcripts, and make caculation of the right primer pairs for different transcripts.

I known perl module MultiPride can design qPCR primers for many genes in a pipeline, which use perl to parse NCBI gene informations, including the gene's dna location, mRNA exon start and end, but I find it is difficult to maintain the perl script, especially when a gene has multiple transcripts because of lack of some exon and intron information for all transcripts of one gene in NCBI, such as CD55, which has more than 6 transcripts, but only 3 recorded in the genbank file (GI: 1604 http://www.ncbi.nlm.nih.gov/nuccore/NC_000001?report=genbank&from=207494817&to=207534311 ) of CD55, so how can I use perl to get the exon and intron's start and stop position on the corresponding DNA of all the transcripts? Thanks again!