I have a list of genes, and I would like to calculate the total length of cDNA for each. Is there any database or a table somewhere where I can look up ?! Is there any way to query ucsc ?!
Alternatively, I looked into ENSMBL through biomart, and I see "cDNA coding start" and "cDNA coding end" are listed in the Attributes. Should I sum them up to have the final cDNA length ?! How can I be sure that the number I would reach is correct and is equal to the true cDNA length !! Is there any database I can double check my result ?!
I am not an expert, but I guess that cDNA length might differ for every gene isoform. Following this logic, checking cDNA length for every transcript is more correct.
Well basically I am after calculating how much of the gene is coding and how much is not coding. I think, based on your suggestion I should calculate cDNA length for every transcript of the gene and sum them up. Is that correct?!
'cDNA coding start' and 'cDNA coding end' will give you the position of the coding start within the cDNA, which is not what you need.
There isn't an easy way to get this via BioMart, I'm afraid. Are you familiar with our Perl APIs?
So you mean, even the summation of coding length within cDNA is not equal to the coding length of the gene ?!
Basically, I have a set of genes, and I would like to group them based on their coding region length.
Forget about cDNA, that's just not the right term.
you might be right - I used cDNA because it was the closest term to what I am looking for, in Biomart. I am after "coding region" of each gene.
Use 'CDS length' in BioMart - that's exactly what you need.
Should take the longest one? because I see there are different entries for a gene
Are you looking for the length of the mature mRNA? This could be easily calculated for each transcript. In my understanding, cDNA is a laboratory construct (reverse transcribed mRNA), which could be fragmented. The length of the coding sequence is not the same because as the mRNA because of UTRs.