Question

compare two text file

0

Entering edit mode

7.7 years ago

Sam ▴ 150

Hello all , Could you please help me about this? I have 2 tab delimited file, I want to compare them to find common Id between 2 file which has one or all of these report in 7 column of the second file “Luciferase reporter assay//qRT-PCR//Western blot” For instance :

1st file 
hsa-miR-654-5p
hsa-miR-182-5p

 2nd file

MIRT733442  hsa-miR-654-5p      Homo sapiens    RPS6KB1 6198    Homo sapiens    PAR-CLIP
MIRT733429  hsa-miR-654-5p      Homo sapiens    EPSTI1  94240   Homo sapiens    Luciferase reporter assay//qRT-PCR//Western blot

Out put 
MIRT733429  hsa-miR-654-5p      Homo sapiens    EPSTI1  94240   Homo sapiens    Luciferase reporter assay//qRT-PCR//Western blot

Thanks in advance

awk bash • 2.5k views

ADD COMMENT • link 7.7 years ago by Sam ▴ 150

2

Entering edit mode

in bash shell with grep:

 $ grep -w "blot$" test2.txt  | grep -f test1.txt

test1.txt - 1st file and test2.txt- 2nd file in OP and assuming that blot is common in all lines ending up with western blot and no other line has word blot in it.

with awk:

 $ awk 'FNR==NR {a[$1]; next } ($2 in a && /'blot$'/)' test1.txt test2.txt

with join:

 $ sed -n /blot$/p test2.txt |  join -1 1 -2 2  test1.txt - --nocheck-order

ADD REPLY • link 7.7 years ago by cpad0112 21k

3

Entering edit mode

grep -f test1.txt test2.txt | grep "Luciferase reporter assay\|qRT-PCR\|Western blot"

ADD REPLY • link 7.7 years ago by lessismore ★ 1.4k

0

Entering edit mode

didn't see the second requirement in OP.

ADD REPLY • link 7.7 years ago by cpad0112 21k

0

Entering edit mode

Thanks lessismore, you are a real biostars :)

ADD REPLY • link 7.7 years ago by Sam ▴ 150

0

Entering edit mode

In R

test <- merge(1st file , 2nd file, by.x="column_1stfile", by.y="column_2ndfile", all.x=T)

ADD REPLY • link 7.7 years ago by lessismore ★ 1.4k

0

Entering edit mode

2nd file is a huge file and with grep -f test1.txt test2.txt > out.put , OP contain all data of test2.txt NOT common data of both files, how I can solve this problem ? for instance , out put file contain hsa-miR-99 but it's not available in test1.txt file

OP:

MIRT027394,hsa-miR-99a-5p,Homo sapiens,AGO2,27161,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027394,hsa-miR-99a-5p,Homo sapiens,AGO2,27161,Homo sapiens,Luciferase reporter assay//qRT-PCR//Western blot,Functional MTI,24732044
MIRT027395,hsa-miR-99a-5p,Homo sapiens,MEF2D,4209,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027396,hsa-miR-99a-5p,Homo sapiens,SKI,6497,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027397,hsa-miR-99a-5p,Homo sapiens,COQ2,27235,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027398,hsa-miR-99a-5p,Homo sapiens,TRIB1,10221,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027398,hsa-miR-99a-5p,Homo sapiens,TRIB1,10221,Homo sapiens,PAR-CLIP,Functional MTI (Weak),26701625

ADD REPLY • link 7.7 years ago by Sam ▴ 150

0

Entering edit mode

I already also tried with this command

grep "Luciferase reporter assay\|qRT-PCR\|Western blot" text2.txt | grep -f test.1.text > 3

but again out put file contain hsa-miR-99 !

ADD REPLY • link 7.7 years ago by Sam ▴ 150

score 0 · Answer 1 · 2017-11-01

0

Entering edit mode

7.7 years ago

Pierre Lindenbaum 166k

find common Id between 2 file

it's a job for comm https://linux.die.net/man/1/comm

comm -12 <(sort -t $'\t' file1.txt | sort | uniq)  <(cut -f 2  file2.txt | sort -t $'\t' | sort | uniq)

ADD COMMENT • link 7.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks for your comment but I want to have an output if it contain one or all of this strings “Luciferase reporter assay//qRT-PCR//Western blot” NOT just common ID