Entering edit mode
6.1 years ago
dllopezr
▴
130
Hi everyone!
I have a bunch of fasta files of gene sequences that I download from ncbi trough entrez direct tool. I am wondering if it is posible to obtain the protein and coding sequences of these genes using the accession, that is in this format: NZ_CP006694.1:1104181-1105143
where the data following the :
is the sequence section where the gene is located.
Can you help me with that?
Thank you so much
An R-based solution would be to use the
Bioconductor
packageBiomaRt
; please see my post here. Since you have the exact chromosomal position already, you can easily covert this to sequences. You can find the appropriate filters (chromosome, start and end position) usinglistFilters(ensembl)
, and the attribute (protein / dan sequence) usinglistAttributes(ensembl)
.