Entering edit mode
8.6 years ago
benarnon
▴
30
Hi. I have a folder with many subfolders which contains many gbk (genbank format) files. I would like to create a list of all the gbk files. For each file I want to retrieve its species, length and accession number.
Anybody has an idea how to do it?
Thanks.
I'm sorry but it doesn't work. Here some of the output: NC_002679 351911 bp NC_002678 7036071 bp NC_002682 208315 bp NC_002663 2257487 bp NC_004741 4599354 bp /note="truncated at N-terminus by 125 bp deletion relative
/note="99 bp internal deletion relative to K-12"
NC_003385 106516 bp /note="57 bp repeat 1"
/note="57 bp repeat 1"
/note="57 bp repeat 1"
/note="57 bp repeat 1"
NC_003384 218160 bp NC_003198 4809037 bp /note="22 bp inverted repeat"
/note="''19 bp repeat unit required for rci mediated
NC_002927 5339179 bp /note="ortholog of Bordetella pertussis (BX470248) bplL"
/note="ortholog of Bordetella pertussis (BX470248) bplI"
/note="ortholog of Bordetella pertussis (BX470248) bplG"
/note="ortholog of Bordetella pertussis (BX470248) bplE"
/note="ortholog of Bordetella pertussis (BX470248) bplC"
/note="ortholog of Bordetella pertussis (BX470248) bplA"
/note="ortholog of Bordetella pertussis (BX470248) bph2"
The file I've downloaded contains one
bp
that is beside the length. So you can modifygrep -E 'ORGANISM | bp'
this part of the above one liner to get proper results. Now its time for you to play with it.