Question

compare several columns of gene symbols with different size

2

Entering edit mode

10.7 years ago

Mo ▴ 920

I have several columns of gene symbols. I would like to know which gene exists similarly in each of columns

For example, I have three columns which shows that the first gene is similar in the first column and second column,

Is there anyone know how to check for this?

ADORA2B     ADORA2B     KLC1
AGPAT5      HOPX        LEPR
ASS1        IGFBP7      LTBP3
C1QBP       INHBA       MBNL2
C4orf19     ITGB5       MLLT11
CASP1       ITGBL1      NOTCH3
CASP1       ITGBL1      NPR3
CASP1                   NUAK1
CASP1                   OLR1
CCL20                   PDGFC
KBTBD11                 PLA2G16
KLF4
ME2
MPDU1
NAT1
PBK
PSMB10
PSMB8
PSMB9

gene-symbols • 2.3k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Mo ▴ 920

0

Entering edit mode

10.7 years ago

Bioinformatics_NewComer ▴ 330

There must be several ways to do this.

Excel could also be used for this.

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Bioinformatics_NewComer ▴ 330

1

Entering edit mode

Thanks, but I don't want to use xls, I would rather use scripting, python or r

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Mo ▴ 920

Ram · Accepted Answer · 2014-12-05

2

Entering edit mode

10.7 years ago

5heikki 11k

comm -1 -2 <(cut -f1 yourFile | sort) <(cut -f2 yourFile | sort)

Would compare the first and second columns and print only values that were in both columns.

comm -1 -2 <(cut -f1 yourFile | sort) <(cut -f3 yourFile | sort)

Would compare the first and third columns and print only values that were in both columns.

etc.

As far as I recall, by default sort uses dictionary order, which should work fine for your data. Or maybe the order they're in already matters? If yes, you should probably use awk. For example:

awk 'FS="\t" {if($1 == $2 && $2 == $3) print}' yourFile

Would only print lines where columns 1 2 and 3 have the same value

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by 5heikki 11k

0

Entering edit mode

One more question, lets say my file called data is saved in xls or csv, then should I just go to the location of the file and instead of "yourfile", I type "data.xls" or "data.csv"

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Mo ▴ 920

1

Entering edit mode

Yeah, or you can define the path in

<(cut -f1 yourFile | sort)

e.g.

<(cut -f1 /user/home/File.tsv | sort)

Keep in mind though that the above examples only work with tab separated values (i.e. not excel files). You can define different separators though, e.g.

cut -f1 -d ","

Would cut the first field separated by a comma..

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by 5heikki 11k

1

Entering edit mode

Thank you so much for your time. I saved the file from xls as a text file (delimited text.txt)

Then I used

comm -1 -2 <(cut -f1 /Users/mohammad/Desktop/test.txt | sort) <(cut -f2 /Users/mohammad/Desktop/test.txt | sort)

but nothing happened, do you know whether I am making a mistake?

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Mo ▴ 920

0

Entering edit mode

How does head test.txt look? Perhaps there's a problem with linebreak format. They should be LF type

ADD REPLY • link 10.7 years ago by 5heikki 11k

0

Entering edit mode

I found when I save my xls file to txt file, something goes wrong. I used "TextWrangler" on Mac to save the two columns with space as a txt file. Now seems to work perfectly

If you know any other way to covert the xls file to a format that your program read it best, please do not hesitate to let me know

Once again thanks

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.7 years ago by Mo ▴ 920