Using EntrezDirect. Output is a key,value
pair.
$ esearch -db assembly -query GCF_023093935 | elink -target biosample | efetch -format xml | xtract -pattern BioSample -element accession -block Attributes -group Attribute -element Attribute@attribute_name Attribute | tr '\t' ','
strain,34Pae36,collected_by,LGMB,collection_date,2017-04,geo_loc_name,Colombia: Bogota,host,Homo sapiens,host_disease,Bacterial infectious disease,isolation_source,collection,lat_lon,4.70 N 74.10 W
$ esearch -db assembly -query GCF_026727755 | elink -target biosample | efetch -format xml | xtract -pattern BioSample -element accession -block Attributes -group Attribute -element Attribute@attribute_name Attribute | tr '\t' ','
strain,C4.2,isolation_source,rubber door sealing of a washing machine,collection_date,2019-10-22,geo_loc_name,Germany: Bielefeld,sample_type,pure culture,biomaterial_provider,Kaltschmidt Lab, Bielefeld University, Germany,collected_by,Ehsan Asghari, Christian Kaltschmidt,identified_by,Ehsan Asghari, Annika Kiel
You can use epost
solution for a list of multiple ID's (one per line in a file) but that may generate error lines with some of the samples that have no information and/or if the queries happen too quickly.
$ epost -db assembly -input id_file | elink -target biosample | efetch -format xml | xtract -pattern BioSample -element accession -block Attributes -group Attribute -element Attribute@attribute_name Attribute | tr '\t' ','
Excellent, I went for the esearch solution since I'm expecting missing information in most of the samples and don't want to deal with that mess afterwards. I just implemented a for loop for the query ID's, maybe not very elegant but helped me with the retrieval from multiple samples.
Thanks a lot!