Create a file with a summary of all gbk files in a folder and subfolders
1
0
Entering edit mode
8.6 years ago
benarnon ▴ 30

Hi. I have a folder with many subfolders which contains many gbk (genbank format) files. I would like to create a list of all the gbk files. For each file I want to retrieve its species, length and accession number.

Anybody has an idea how to do it?

Thanks.

bgk genbank • 2.0k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
venu 7.1k

Say you have a directory ALL_GBK with many sub directories. You can do something like

find /home/user/ALL_GBK/ -type f -name "*.gb"  | xargs -I % grep -E 'ORGANISM | bp' % | paste - - | sed 's/   */\t/g' | cut -f 2,3,9 > result.txt

I have tested keeping a single file in multiple (sub)directories, and the result is

AB021961    1429 bp Mus musculus
AB021961    1429 bp Mus musculus
AB021961    1429 bp Mus musculus
ADD COMMENT
0
Entering edit mode

I'm sorry but it doesn't work. Here some of the output: NC_002679 351911 bp NC_002678 7036071 bp NC_002682 208315 bp NC_002663 2257487 bp NC_004741 4599354 bp /note="truncated at N-terminus by 125 bp deletion relative
/note="99 bp internal deletion relative to K-12"
NC_003385 106516 bp /note="57 bp repeat 1"
/note="57 bp repeat 1"
/note="57 bp repeat 1"
/note="57 bp repeat 1"
NC_003384 218160 bp NC_003198 4809037 bp /note="22 bp inverted repeat"
/note="''19 bp repeat unit required for rci mediated
NC_002927 5339179 bp /note="ortholog of Bordetella pertussis (BX470248) bplL"
/note="ortholog of Bordetella pertussis (BX470248) bplI"
/note="ortholog of Bordetella pertussis (BX470248) bplG"
/note="ortholog of Bordetella pertussis (BX470248) bplE"
/note="ortholog of Bordetella pertussis (BX470248) bplC"
/note="ortholog of Bordetella pertussis (BX470248) bplA"
/note="ortholog of Bordetella pertussis (BX470248) bph2"

ADD REPLY
0
Entering edit mode

The file I've downloaded contains one bp that is beside the length. So you can modify grep -E 'ORGANISM | bp' this part of the above one liner to get proper results. Now its time for you to play with it.

ADD REPLY

Login before adding your answer.

Traffic: 2723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6