compare two text file
1
0
Entering edit mode
7.2 years ago
Sam ▴ 150

Hello all , Could you please help me about this? I have 2 tab delimited file, I want to compare them to find common Id between 2 file which has one or all of these report in 7 column of the second file “Luciferase reporter assay//qRT-PCR//Western blot” For instance :

1st file 
hsa-miR-654-5p
hsa-miR-182-5p

 2nd file

MIRT733442  hsa-miR-654-5p      Homo sapiens    RPS6KB1 6198    Homo sapiens    PAR-CLIP
MIRT733429  hsa-miR-654-5p      Homo sapiens    EPSTI1  94240   Homo sapiens    Luciferase reporter assay//qRT-PCR//Western blot

Out put 
MIRT733429  hsa-miR-654-5p      Homo sapiens    EPSTI1  94240   Homo sapiens    Luciferase reporter assay//qRT-PCR//Western blot

Thanks in advance

awk bash • 2.2k views
ADD COMMENT
2
Entering edit mode

in bash shell with grep:

 $ grep -w "blot$" test2.txt  | grep -f test1.txt

test1.txt - 1st file and test2.txt- 2nd file in OP and assuming that blot is common in all lines ending up with western blot and no other line has word blot in it.

with awk:

 $ awk 'FNR==NR {a[$1]; next } ($2 in a && /'blot$'/)' test1.txt test2.txt

with join:

 $ sed -n /blot$/p test2.txt |  join -1 1 -2 2  test1.txt - --nocheck-order
ADD REPLY
3
Entering edit mode
grep -f test1.txt test2.txt | grep "Luciferase reporter assay\|qRT-PCR\|Western blot"
ADD REPLY
0
Entering edit mode

didn't see the second requirement in OP.

ADD REPLY
0
Entering edit mode

Thanks lessismore, you are a real biostars :)

ADD REPLY
0
Entering edit mode

In R

test <- merge(1st file , 2nd file, by.x="column_1stfile", by.y="column_2ndfile", all.x=T)
ADD REPLY
0
Entering edit mode

2nd file is a huge file and with grep -f test1.txt test2.txt > out.put , OP contain all data of test2.txt NOT common data of both files, how I can solve this problem ? for instance , out put file contain hsa-miR-99 but it's not available in test1.txt file

OP:

MIRT027394,hsa-miR-99a-5p,Homo sapiens,AGO2,27161,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027394,hsa-miR-99a-5p,Homo sapiens,AGO2,27161,Homo sapiens,Luciferase reporter assay//qRT-PCR//Western blot,Functional MTI,24732044
MIRT027395,hsa-miR-99a-5p,Homo sapiens,MEF2D,4209,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027396,hsa-miR-99a-5p,Homo sapiens,SKI,6497,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027397,hsa-miR-99a-5p,Homo sapiens,COQ2,27235,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027398,hsa-miR-99a-5p,Homo sapiens,TRIB1,10221,Homo sapiens,Sequencing,Functional MTI (Weak),20371350
MIRT027398,hsa-miR-99a-5p,Homo sapiens,TRIB1,10221,Homo sapiens,PAR-CLIP,Functional MTI (Weak),26701625
ADD REPLY
0
Entering edit mode

I already also tried with this command

grep "Luciferase reporter assay\|qRT-PCR\|Western blot" text2.txt | grep -f test.1.text > 3

but again out put file contain hsa-miR-99 !

ADD REPLY
0
Entering edit mode
7.2 years ago

find common Id between 2 file

it's a job for comm https://linux.die.net/man/1/comm

comm -12 <(sort -t $'\t' file1.txt | sort | uniq)  <(cut -f 2  file2.txt | sort -t $'\t' | sort | uniq)
ADD COMMENT
0
Entering edit mode

Thanks for your comment but I want to have an output if it contain one or all of this strings “Luciferase reporter assay//qRT-PCR//Western blot” NOT just common ID

ADD REPLY
0
Entering edit mode

then it's join : follow genomax 's link.

ADD REPLY
0
Entering edit mode

Here the link to that thread: How to retrieve rows from OTU table

ADD REPLY

Login before adding your answer.

Traffic: 1310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6