Entering edit mode
3.5 years ago
genomes_and_MGEs
▴
10
Hey guys,
I have a list of nuccore IDs in a text file (let's call it file.txt), and want to append the NCBI's refseq assembly accession number next to the nuccore ID, such as this
GCF_000006765.1_NC_002516.2
I've tried with the following command, but only the NCBI's refseq assembly accession number shows up
for file in $(cat file.txt) ; do esearch -db nuccore -query "$file" | elink -db assembly -target assembly | esummary | xtract -pattern DocumentSummary -element Caption,AssemblyAccession,BioSample >> GCFs_nucl_accessions.txt; done
Can you help me out? Thanks!
What if there was no assembly for a nucleotide sequence?
All the nucleotide IDs I have correspond to either the chromosome or plasmids from complete bacterial genomes, so I expect each ID will have a corresponding assembly accession.
Ok, your query was fine, I think I fixed the shell code so it works as expected now.
This is the assembly database record for the ID included above. Are you looking to get
NC*
id based on theGCF
ID?GCF
ID's are RefSeq ID's by the way they are not nuccore ID's.I have a list of NC* IDs, and want to append the corresponding GCF ID to each one.
Please post more than one example. It is always good to do this when you ask questions about ID's. You can simply do this to get the
GCF
ID:Sure, here's the top 5 IDs