How to get a FASTA format file with the DNA sequences of all annotated genes?
2
0
Entering edit mode
3.2 years ago
Student ▴ 30

I am analysing Pyrococcus Furiosus DNA sequencing data by considering data published here in NCBI. When I click on "Send to">"Gene Features">"FASTA format" I download a file that has the sequences of genes of this organism but I realized that this file has some sequences that are doubled... is there a way to get a file with all genes annotated and their respective DNA sequences without double sequences and so in a well "ordered" way ? In NCBI (in the link I reported before) it indicates 2,128 genes so I would like a file with all these 2,128 genes annotated and their respective DNA sequences in FASTA format. Do you know if there is an other website or an other place in NCBI in which I can search to get this kind of file?

fasta sequence-analysis database dataset ncbi • 709 views
ADD COMMENT
1
Entering edit mode
3.2 years ago
GenoMax 147k

Using Entrezdirect:

$ esearch -db nuccore -query NC_003413.1 | efetch -format fasta_cds_na | grep ">" | wc -l
2060

$ esearch -db nuccore -query NC_003413.1 | efetch -format fasta_cds_na > gene.fa

You should also be able to get the files from genome FTP folder for this organism: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/008/245/085/GCF_008245085.1_ASM824508v1/GCF_008245085.1_ASM824508v1_cds_from_genomic.fna.gz

ADD COMMENT
1
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 28k

I see two Pyrococcus furiosus DSM 3638 RefSeq assemblies:

I think the files you want have cds_from_genomic.fna.gz in their names. They need to be decompressed using gunzip.

ADD COMMENT

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6