I'm looking to pull information from data made available on the NCBI site. So far I've made use of the geneinfo and gene2accession datasets from ftp://ftp.ncbi.nih.gov/gene. So I've got GeneIDs, and accession versions/gi's for the nucleotide, mRNA and protein sequences associated with the geneID. The actual sequences I could get from gene2refseq but is there any way I could get just the lengths of the various transcripts?
I can't use Entrez, I need a copy of the raw data.
Thanks for the answer Michael but I can't use that due to the requirements. I'm going to be making way too many requests for too much data so I require the actual dataset.
NCBI does allow a large number of queries as long as you don't exceed 3 per second and it is recommended that you do it during US nighttime - so a couple of thousand should be no problem.