compare several columns of gene symbols with different size
2
2
Entering edit mode
10.1 years ago
Mo ▴ 920

I have several columns of gene symbols. I would like to know which gene exists similarly in each of columns

For example, I have three columns which shows that the first gene is similar in the first column and second column,

Is there anyone know how to check for this?

ADORA2B     ADORA2B     KLC1
AGPAT5      HOPX        LEPR
ASS1        IGFBP7      LTBP3
C1QBP       INHBA       MBNL2
C4orf19     ITGB5       MLLT11
CASP1       ITGBL1      NOTCH3
CASP1       ITGBL1      NPR3
CASP1                   NUAK1
CASP1                   OLR1
CCL20                   PDGFC
KBTBD11                 PLA2G16
KLF4
ME2
MPDU1
NAT1
PBK
PSMB10
PSMB8
PSMB9
gene-symbols • 2.0k views
ADD COMMENT
2
Entering edit mode
10.1 years ago
5heikki 11k
comm -1 -2 <(cut -f1 yourFile | sort) <(cut -f2 yourFile | sort)

Would compare the first and second columns and print only values that were in both columns.

comm -1 -2 <(cut -f1 yourFile | sort) <(cut -f3 yourFile | sort)

Would compare the first and third columns and print only values that were in both columns.

etc.

As far as I recall, by default sort uses dictionary order, which should work fine for your data. Or maybe the order they're in already matters? If yes, you should probably use awk. For example:

awk 'FS="\t" {if($1 == $2 && $2 == $3) print}' yourFile

Would only print lines where columns 1 2 and 3 have the same value

ADD COMMENT
0
Entering edit mode

One more question, lets say my file called data is saved in xls or csv, then should I just go to the location of the file and instead of "yourfile", I type "data.xls" or "data.csv"

ADD REPLY
1
Entering edit mode

Yeah, or you can define the path in

<(cut -f1 yourFile | sort)

e.g.

<(cut -f1 /user/home/File.tsv | sort)

Keep in mind though that the above examples only work with tab separated values (i.e. not excel files). You can define different separators though, e.g.

cut -f1 -d ","

Would cut the first field separated by a comma..

ADD REPLY
1
Entering edit mode

Thank you so much for your time. I saved the file from xls as a text file (delimited text.txt)

Then I used

comm -1 -2 <(cut -f1 /Users/mohammad/Desktop/test.txt | sort) <(cut -f2 /Users/mohammad/Desktop/test.txt | sort)

but nothing happened, do you know whether I am making a mistake?

ADD REPLY
0
Entering edit mode

How does head test.txt look? Perhaps there's a problem with linebreak format. They should be LF type

ADD REPLY
0
Entering edit mode

I found when I save my xls file to txt file, something goes wrong. I used "TextWrangler" on Mac to save the two columns with space as a txt file. Now seems to work perfectly

If you know any other way to covert the xls file to a format that your program read it best, please do not hesitate to let me know

Once again thanks

ADD REPLY
0
Entering edit mode
10.1 years ago

There must be several ways to do this.

Excel could also be used for this.

ADD COMMENT
1
Entering edit mode

Thanks, but I don't want to use xls, I would rather use scripting, python or r

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6