Entering edit mode
17 months ago
Hi! I want to use NCBI API in order to retrieve the fasta sequences from a list of lncRNA that i'm interested in, there are someone that have done this procedure in past? It is possible to do? please help me
It doesn't help me a lot. But you known if it is possible to do what i want?
What does not help you? You can use both tools above to retrieve sequences using command line. Provide a few example accessions and I can show you how to download sequence using these tools.
Sorry i didn't want to be rude.. Thank you for helping me, im sorry but its the first time I use API tbh... i will like to retrieve the sequence of i.e.: LOC105370256 LOC105377806 LOC107984590 LOC107984591
To be honest i don't know if this can be use as accession name, but from a list of lncRNA as this i will like to retrieve the sequence. Thank you for helping me, appreciate a lot!
For completeness, using web interface of
datasets
: https://www.ncbi.nlm.nih.gov/datasets/gene/Using EntrezDirect:
will get you the following variants. I am removing genomic/chromosome records to just get you ncRNA.
LOC
designations are used for genes that do not have a final gene ID.LOC105377806, LOC107984590
do not appear to be a valid ID. So with such ID you will get an error.Hi, Following Genomax' suggestion, you can use NCBI Datasets. I used the IDs you posted to create a list.
The I used the NCBI Datasets CLI to download the FASTA sequences for those gene-ids.
By default, the NCBI Datasets includes transcript and protein sequences (or in this case, only rna sequences):
All requested transcript sequences will be in the file
rna.fna
:You can also retrieve the gene sequences by using the flag
--include gene
.One important point is that the LOC genes are part of an annotation that can be updated and result in a discontinuation of a gene. In this case, two of them (LOC105377806 and LOC107984590) are discontinued and we have no data for them.
I hope this helps! :)