Question

Converting Gene Interaction File From Multiple To Three Columns To Be Imported Into Cytoscape

0

Entering edit mode

11.5 years ago

Diana ▴ 930

Hi,

I have this network file based on gene expression data that indicates interactions between genes such that the first column is the source and the second column is the target, the third column is the edge attribute for the interaction between first and second column, the fourth column is again the target of the first column and fifth column is the edge attribute for this interaction and so on...I'm trying to use this file to generate a network in cytoscape but all sources have to be in 1st column and targets have to be in 2nd column and any edge attributes thereafter. How can I convert this file into 3 columns with 1st column being source, 2nd target and 3rd the edge attribute. Can it be done in R?

    A4GALT    ABI1    0.290467    ABL1    0.291354    ACTC1    0.290467    AKR1B10    0.322647    AMN1    0.290467
    AAGAB    AHCYL1    0.286272    ALG10    0.275442    ANKRD15    0.303029    CA12    0.303029    CDC42SE2    0.286272
    AARSD1    AARSD1    0.274792    ABCC4    0.27289        ACADL    0.349349    ACBD5    0.329398    ACSL4    0.335957
    AARSD1    AARSD1    0.274792    ACADL    0.274792    ACBD5    0.26986        ACSL4    0.291354    ACTR3B    0.26986

Edit: The output file should look like this for example the first line should become:

A4GALT    ABI1  0.290467    
A4GALT    ABL1    0.291354    
A4GALT    ACTC1    0.290467    
A4GALT    AKR1B10    0.322647    
A4GALT    AMN1    0.290467

and similarly all lines should be converted to such format in a single text file.

code edit:

#!/usr/bin/env perl

open(INFILE, "<aracne_network_microarray.txt") or die ("couldn't open the file\n");

use v5.10;
use strict;
use warnings;

my @a =();
my @b =();
open(MYOUTFILE, ">aracne_network_output.txt"); 

while (<INFILE>) {
    chomp;
    s/^\t|^\s+|//;
    my ($source, $tar, $att, @f) = split;
    @a = join ("\t", $source, $tar, $att);
    while (my ($targ, $attr) = splice(@f, 0, 2)) {
        @b = join ("\t", $source, $targ, $attr); 
         print MYOUTFILE "@b"   }

}
close(MYOUTFILE)

Thanks alot!!!

cytoscape r • 3.2k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 11.5 years ago by Diana ▴ 930

0

Entering edit mode

I can write script for you if you could give me the example of desired output.

ADD REPLY • link 11.5 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Ive edited the post. Thanks a lot Noolean!

ADD REPLY • link 11.5 years ago by Diana ▴ 930

score 1 · Answer 1 · 2013-07-03

1

Entering edit mode

11.5 years ago

SES 8.6k

You could do this in R but I prefer to do these things in Perl because it is much more expressive and powerful for text processing. What I mean is that you can write out what you want to do in English:

#!/usr/bin/env perl

# pseudocode follows

read in a line of data
   split the line into the component fields
   say what you want (in this case, a source followed by a target, followed by the edge attribute)
   repeat the previous step for each target

# end

and already you have something that is nearly ready to execute. Below you can see the real code is not very different from the plain English version:

#!/usr/bin/env perl

use v5.10;
use strict;
use warnings;

while (<DATA>) {
    chomp;
    s/^\t|^\s+|//;
    my ($source, $tar, $att, @f) = split;
    say join "\t", $source, $tar, $att;
    while (my ($targ, $attr) = splice(@f, 0, 2)) {
        say join "\t", $source, $targ, $attr; 
    }
}

__DATA__
    A4GALT    ABI1    0.290467    ABL1    0.291354    ACTC1    0.290467    AKR1B10    0.322647    AMN1    0.290467
    AAGAB    AHCYL1    0.286272    ALG10    0.275442    ANKRD15    0.303029    CA12    0.303029    CDC42SE2    0.286272
    AARSD1    AARSD1    0.274792    ABCC4    0.27289        ACADL    0.349349    ACBD5    0.329398    ACSL4    0.335957
    AARSD1    AARSD1    0.274792    ACADL    0.274792    ACBD5    0.26986        ACSL4    0.291354    ACTR3B    0.26986

Running this gives your source-target-attribute:

$ perl biostar75832.pl
A4GALT    ABI1    0.290467
A4GALT    ABL1    0.291354
A4GALT    ACTC1    0.290467
A4GALT    AKR1B10    0.322647
A4GALT    AMN1    0.290467
AAGAB    AHCYL1    0.286272
AAGAB    ALG10    0.275442
AAGAB    ANKRD15    0.303029
AAGAB    CA12    0.303029
AAGAB    CDC42SE2    0.286272
AARSD1    AARSD1    0.274792
AARSD1    ABCC4    0.27289
AARSD1    ACADL    0.349349
AARSD1    ACBD5    0.329398
AARSD1    ACSL4    0.335957
AARSD1    AARSD1    0.274792
AARSD1    ACADL    0.274792
AARSD1    ACBD5    0.26986
AARSD1    ACSL4    0.291354
AARSD1    ACTR3B    0.26986

You can load this into Cytoscape by using the Import->"as table" and then indicating what fields are the source, target, and attribute.

ADD COMMENT • link 11.5 years ago by SES 8.6k

0

Entering edit mode

Thanks a lot Noolean. There's just one thing. I have different no. of interactions for each gene and therefore I cannot specify. Also I have a huge file. There's 3113 lines like above.

ADD REPLY • link 11.5 years ago by Diana ▴ 930

1

Entering edit mode

Hi, I'm not Noolean. A file of 3,000 lines would not really be considered "huge" by today's standards and it should only take around a second to do this (more or less). I edited the code to work with an arbitrary number of fields in this format, but please try to provide this kind of information in the future. It helps others spend less time on questions and you can get your answer faster! Hope this helps, let us know if you run into any issues.

ADD REPLY • link 11.5 years ago by SES 8.6k

0

Entering edit mode

Hi sorry SES, I didn't realize I was writing the wrong name. Thanks a lot. I'll try to give as much detail as I can next time.

ADD REPLY • link 11.5 years ago by Diana ▴ 930

0

Entering edit mode

Hi SES, I cant seem to write the right format into a text file. I'm not familiar with Perl at all. Ive put the code as Ive modified it in the original post. What am I doing wrong? Thanks a lot

ADD REPLY • link 11.5 years ago by Diana ▴ 930

0

Entering edit mode

You changed the script in your post, that is why it won't work. I modified it below to take your input as the first argument and the output as the second argument.

#!/usr/bin/env perl

use v5.10;
use strict;
use warnings;

my $usage = "perl $0 infile outfile\n";
my $infile = shift or die $usage;
my $outfile = shift or die $usage;

open my $in, '<', $infile or die "\nERROR: Could not open file $infile: $!";
open my $out, '>', $outfile or die "\nERROR: Could not open file $outfile: $!";

while (<$in>) {
    chomp;
    s/^\t|^\s+|//;
    my ($source, $tar, $att, @f) = split;
    say $out join "\t", $source, $tar, $att;
    while (my ($targ, $attr) = splice(@f, 0, 2)) {
        say $out join "\t", $source, $targ, $attr; 
    }
}
close $in;
close $out;