Entering edit mode
4.4 years ago
User6891
▴
330
Hi,
I have a list of 500 human gene symbols. For each of them I want to find the NCBI transcript with the longest coding sequence. How can I do this?
Take a look at MANE project that is a collaboration between NCBI/EBI.
Is it possible to identify unambiguously all the genes using these gene symbols in ncbi search? Perhaps, additional filters like " AND Homo sapiens[Primary Organism] AND refseq[filter]" may be necessary. If it is possible, you may easily download all the sequences for different transcripts and then identify longest ones with rentrez package in R, using entrez_search to get id's and then entrez_fetch with rettype argument set to "fasta".