Hi All,
I'm trying to retrieve the nucleotide sequences of the complete RefSeq protein CDS's. I've looked at the files at ftp://ftp.ncbi.nih.gov/refseq/release/complete/ but I can't seem to find a file that has the CDS + the original nucleotide (whole genome) sequence that the CDS came from.
I don't have a problem parsing genbank files - just seems odd that there isn't one genbank file that has the information I need.
I could add the whole genome sequences to the genbank files that have the CDS info. Just seems like I'm missing something obvious here.
Here's the general problem I'm trying to solve:
I have a protein with accession "CAA23625" from RefSeq - I'd like the nucleotide sequence of the CDS. Ideally I'd like to do the parsing locally without having to really on hitting NCBI's server with an Entrez query. Thanks,
Rohan
What is your question? Perhaps a specific example might help...