Entering edit mode
5.8 years ago
genomes_and_MGEs
▴
10
Hey guys,
Another question: Some of the outputs don't have the strain name. I guess the reason is that the organism name doesn't have that info. For example here https://www.ncbi.nlm.nih.gov/assembly/GCF_003290365.1/. If I use
for f in GCF* ; do term=$(echo $f | cut -f1,2 -d'_') ; esearch -db assembly -q $term | esummary | xtract -pattern DocumentSummary -sep ' ' -element Organism,Strain,AssemblyAccession | sed 's/ /_/g' ; done > filenames.txt
The strain name doesn't appear on filenames.txt. Could you please let me know what I'm doing wrong?
Cheers
If I just run the example you posted it works but does not print a strain info:
It looks like the strain number is in a different field (
sub_value
) which you may need to include:You can try this and let us know if this works for other items on your list.
Edit: Re-reading your post it seems that you are not able to generate an answer (strain name). In that case you need to investigate
term=$(echo $f | cut -f1,2 -d'_')
to see what values you are getting forterm
. Put anecho $term
to examine that variable in your loop (remove theesearch
command temporarily, if needed).