Split Database Of Mol2 Files Into Chunks Of N Files
3
0
Entering edit mode
13.5 years ago
Flow ★ 1.6k

I downloaded portions of ZINC database in mol2 format. Each part contains 20000 ligands. I would like to split each one into 100 ligands chunks. I know that with openbabel one can do

babel -i zinc_part.mol2 a.mol2 -m

but the chunks are always individual files. Later one could of course concatenate them, but this can be a mess. I wonder if there are other babel options or another approaches to divided a large mol2 files into chunks of N ligands each.

split database format • 8.5k views
ADD COMMENT
1
Entering edit mode
13.5 years ago

One way could be to take this Perl script to split a file, and add this to the top:

$/ = "@<TRIPOS>MOLECULE";

This would instruct Perl to treat each record as one "line". But you may have to fiddle around with this to make sure your output is valid.

ADD COMMENT
1
Entering edit mode
13.5 years ago

Tobias Kind gave 20 solutions on the Blue Obelisk eXchange website.

ADD COMMENT
0
Entering edit mode
11.4 years ago
morency.lp • 0

I just fell on the thread and wanted to share my PERL script splitting multi-molecular ZINC mol2 files into single files named ZINCxxxxxxxx.mol2. Command line : perl separe_ligands.pl -f <filename>

#!/usr/bin/perl
use strict;
use warnings;
use Cwd;

my $file="";
my @ligands=();
my $mol2="";
my $workdir=getcwd;

for my $i (0..@ARGV-1)
{
    if ($ARGV[$i] eq "-f") {++$i;$file = $ARGV[$i];}
    if ($ARGV[$i] eq "-w") {++$i;$workdir .= $ARGV[$i];}
}

open IN, "$file";
while (<IN>)
{
    my $line = $_;
    chomp($line);
    if ($line =~ m/@<TRIPOS>MOLECULE/ && $mol2 eq "")
    {
        $mol2.=$line."\n";
    }
    elsif ($line =~ m/@<TRIPOS>MOLECULE/ && $mol2 ne "" )
    {
        my $outfile.=pop(@ligands).".mol2";
        open OUT, ">$outfile";
        print OUT "$mol2";
        close OUT;
        $mol2=$line."\n";
    }
    else
    {
        if($line =~ m/ZINC/) {push(@ligands,$line);}
        $mol2.=$line."\n";
        chomp($line);
    }
}
close IN;
exit;
ADD COMMENT
0
Entering edit mode

For Unix based OS (no guarantee under Windows).

ADD REPLY

Login before adding your answer.

Traffic: 1669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6