Extracting data from Biogrid mitab format
1
0
Entering edit mode
9.2 years ago
stackf01 ▴ 20

Hello there. I have downloaded bulk of PPI data from BioGRID in the current release section. It is in PSI - MITAB format. I have downloaded this d​ata to add in msql database. My question is how to I get this data which is in PSI -MITAB format in a proper format to integrate the data in MySQL database?

biogrid • 3.4k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

Here is the perl code I use to put PPI into an undirected graph. It is only concerned with extracting protein IDs and Pubmed refs but you can easily modify it to get the rest.

  use Graph::Undirected;
  my $G = Graph::Undirected->new();
  my %notfound;
  open (FH,"<",$BIOGRID) or die "Can't open file $BIOGRID: $!\n";
  while (my $line=<FH>) {
    chomp($line);
    my @tmpary = split(/\t/,$line);
    my @P1s = split(/\|/,$tmpary[2]);
    my ($p1) = $P1s[0]=~/:(.+)/;
    my @P2s = split(/\|/,$tmpary[3]);
    my ($p2) = $P2s[0]=~/:(.+)/;
    next if (!$p1 || !$p2 || $notfound{$p1} || $notfound{$p2});
    my $EnsemblID1 = &get_EnsemblID($p1); # This is a function to convert to Ensembl IDs
    my $EnsemblID2 = &get_EnsemblID($p2);
    if ($EnsemblID1 && $EnsemblID2) {
      $G->add_edge($EnsemblID1,$EnsemblID2);
      if ($tmpary[8]) {
        my ($pmid) = $tmpary[8];
        $G->set_edge_attribute($EnsemblID1,$EnsemblID2,'pubmed',$pmid);
      }
    }
    else {
      if (!$EnsemblID1) {
        $notfound{$p1}++;
      }
      if (!$EnsemblID2) {
        $notfound{$p2}++;
      }
    }
  }
  close FH;
ADD COMMENT
0
Entering edit mode

Thanks for your answer :)

I don't have any experience in Perl. So I cannot understand much what you did there. Can you elaborate further ? :/

I have downloaded BioGRID data in the mitab format. How do I put all this data in a table of MySQL database ?​

ADD REPLY
0
Entering edit mode

The script processes the MITAB file line by line, splitting each line on tab characters into an array of cells. The official gene symbol of the interacting proteins are in cells with indices 2 and 3. These cells can potentially contain multiple alternate entries separated by '|' so their content is split then the gene symbol is extracted from the first entry with regular expression simply as all characters after ':'. Then I convert the gene symbols to Ensembl gene IDs (for a specific version of Ensembl) using a custom get_EnsemblID function (which internally uses the Ensembl perl API) and put the results as nodes of an undirected graph data structure. You can accomplish the same thing with your favorite programming language.
As for putting this into a MySQL database, it depends on the structure of the database. The basic SQL query to use is INSERT INTO ppi_table (ID1,ID2) VALUES($p1,$p2) which you would use with the perl DBI and DBD::mysql modules.

ADD REPLY
0
Entering edit mode

Do you have a similar script in php or java please ? ​

ADD REPLY
0
Entering edit mode

No. Sorry. But php's syntax is very similar to perl's (it's derived from perl) so you should be able to easily adapt perl code to php.

ADD REPLY
0
Entering edit mode

Another enlightenment !

The​ file that I have downloaded, is like a table structure? with rows and columns?

ADD REPLY
0
Entering edit mode

Yes. If you need all the data in it you could import it directly into MySQL. First create a table with as many columns as there are in the file (name them accordingly), then you can load the data from the mysql client with LOAD DATA INFILE 'PPI.mitab' INTO TABLE ppi. Check the MySQL manual for this. Also for fast retrieval, you need to enable indexing on the columns you want to query. However, the problem with this approach is that cells of the table will have multiple entries (separated by '|') which is not good database design and may interfere with efficient use of the index.

ADD REPLY
0
Entering edit mode

So basically you get the idea what I intend to make. My webpage will fetch this data from MySQL and pass it to JSON to be read in Cytoscape and display me the protein interaction network.

ADD REPLY

Login before adding your answer.

Traffic: 1477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6