Hi,
I want to download complete CDS of a genome from refseq and unigene, from the ftp site i wasn't able to get the data. Is it possible to do so using eutils or biomart? If so, can please anyone guide me through the general steps, i don't need the script as i am learning a scripting language myself which i believe i can figure out.
Also, how to retrieve CDS data from a complete cDNA (fasta format) of a genome ?
Can you explain to us how you were not able to get the data from the FTP site? Were you seeing any error messages? It usually is possible to use eutils or biomart as most public databases use those services. What are the specific steps you are having trouble with?
I should have mentioned before that i am just starting to learn bioinformatics and sorry for being quite short on my query earlier.
Suppose i want to download full length cDNA of Zea mays from Refseq and i am here at ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plant/ but to understand the nomenclature pattern i am trying to download this catalog file -ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/RefSeq-release60.catalog.gz , i get a ' forbidden ' error message every time i try to download it.
Now coming to second part of my query, from the above refseq plant link ,i have downloaded this one random file 'plant.1.rna.fna.gz' and here's one sequence from this file :
Can you tell me how to identify CDS in this sequence???
Similarly, where to download the complete transcriptome of Zea mays from unigene, i am here at ftp://ftp.ncbi.nlm.nih.gov/repository/UniGene/Zea_mays/, the info file shows the species contains 146856 mRNAs but which of the files listed therein contains these mRNA? As far as i know, unigene maintains gene oriented clusters, so each entry contains a cluster ,so do i have to look for mrna in description of that cluster? And again how to identify CDS once i have got the transcriptome???
I know what i am asking is fairly basic but like i said earlier, i am new to this field and any help extended would be appreciable. Also, i know there are plant specific databases which can easily provide this information, but i want to do so using NCBI, however i am missing something very fundamental here.