How To Turn A Two Column Data File Containing Pairs Into A Matrix With Counts
2
1
Entering edit mode
11.9 years ago
RandManP ▴ 10

Hi All, I intersected to data sets and I have two columns like:

m01      m01 
m01      m02
m01      m05
m01      m032
m01      m02
m01      m01
m02      m06
m02      m01
m02      m02
m02      m09
...     ...
m0500     m023
...

I would like to get number of matches with others like:

           m01    m02     m03     ......

 m01       25      45      98      .....
 m02       90      223     12      ......
  .
  .

Would you please help me how I can do that?

Thank you very much

perl • 7.2k views
ADD COMMENT
6
Entering edit mode
11.9 years ago

You didn't say which language you are using. I guess since you tagged it with awk and perl, that's what you want?

Anyway, if you were using R, you could do it with a call to table:

## Maybe you have this in a tab delimited file already
> dat <- read.table('/path/to/2-column-file.txt')
## but I'll generate a random table that looks close-enough to your data:
> set.seed(1)
> dat <- data.frame(x1=rep(c('m01', 'm02', 'm03'), each=10),
                    x2=sample(c('m01', 'm02', 'm03'), 30, replace=TRUE))
> head(dat)
    x1  x2
1| m01 m01
2| m01 m02
3| m01 m02
4| m01 m03
5| m01 m01
6| m01 m03

> table(dat$x1, dat$x2)
      m01 m02 m03
  m01   3   4   3
  m02   2   3   5
  m03   4   4   2
ADD COMMENT
0
Entering edit mode

Thank you very much. R also fine with me. Thanks again

ADD REPLY
4
Entering edit mode
11.9 years ago

You won't get a matrix, but the answer would be the same.

sort 2-col.txt | uniq -c

Or with awk.

awk -v OFS="\t" '{cnt[$0]+=1} END{for (key in cnt) {print key,cnt[key]} }' 2-col.txt

Or with perl.

perl -lane '$cnt{$_}++; END { foreach $key (keys %cnt) {print "$key\t$cnt{$key}"} }' 2-col.txt
ADD COMMENT
0
Entering edit mode

works fine :) thank you very much

ADD REPLY

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6