compare and merge two txt files.
3
1
Entering edit mode
8.3 years ago
fufuyou ▴ 110

Hi, I have two SNP files. I want to compare and merge them.

File1

chrA01  2024    G       A       10      11
chrA01  2026    T       A       10      11
chrA01  2289    A       T       3       3
chrA01  2314    C       T       3       3
chrA01  2323    T       C       3       3
chrA01  26034   A       G       4       5

file2

chrA01  2024    G       A       10      12
chrA01  2026    T       A       10      12
chrA01  2300    A       G       3       3
chrA01  26034   A       G       5       6

I want to get the results:

chrA01  2024    G       A       10      11     10      12
chrA01  2026    T       A       10      11     10      12
chrA01  26034   A       G       4       5      5       6

I have tried command as like:

comm -12 <( sort BR.txt ) <( sort BS.txt )
awk -F'|' 'NR==FNR{c[$1$2$3$4]++;next};c[$1$2$3$4] > 0' BR.txt  BS.txt

kdiff3 BR.txt  BS.txt -m
kdiff3 BR.txt  BS.txt -o SNP.txt
awk 'NR==FNR{tgts[$1$2$3$4]; next} $1$2$3$4 in tgts' BR.txt  BS.txt > BRS.txt

join -1 1 -2 1 -3 1 -4 1 BR.txt  BS.txt | awk '{print $1}' > BRS.txt.

But it does not work well. Could you help modified my commands?

Thanks, Fuyou

snp • 2.1k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

Just one command with csvtk, another cross-platform, efficient, practical and pretty CSV/TSV toolkit in Golang.

Suppose that the file 1 and file 2 are tab-delimited/tabular files, use this

csvtk -H -t join -f 1,2,3,4 file1 file2

Let me explain the command:

-H           global option, means no header line
-t           global option, means input files are tab-delimited/tabular
join         subcommand
-f 1,2,3,4   the keys columns
file1 file2  input files, support more!

The results are exactly what you want:

chrA01  2024    G       A       10      11      10      12
chrA01  2026    T       A       10      11      10      12
chrA01  26034   A       G       4       5       5       6

If the input files are space delimited, use csvtk space2tab for pre-processing.

ADD COMMENT
0
Entering edit mode

Thanks. It works well. Fuyou

ADD REPLY
1
Entering edit mode
8.3 years ago
bioguy24 ▴ 230

Maybe this awk:

awk 'NR==FNR{A[$2];next}$2 in A' file1 file2
chrA01  2024    G       A       10      12
chrA01  2026    T       A       10      12
chrA01  26034   A       G       5       6
ADD COMMENT
1
Entering edit mode

I have tried this. It does not work well.

ADD REPLY
0
Entering edit mode
8.3 years ago

no tested:

join -t '\t' -1 1 -2 1 \
                  <(awk '{printf("%s_%s_%s_%s\t%s\n",$1,$2,$3,$4,$0);}' BR.txt | sort -t '\t' -k1,1 ) \
                  <(awk '{printf("%s_%s_%s_%s\t%s\n",$1,$2,$3,$4,$0);}' BS.txt | sort -t '\t' -k1,1 ) \
| cut -f 2-
ADD COMMENT
1
Entering edit mode

Thanks, It does not work. sort: multi-character tab \\t' sort: multi-character tab\t' Fuyou

ADD REPLY
1
Entering edit mode

of course, I can't type a tab on the screen : Ctrl-V+tab http://stackoverflow.com/questions/10627989/how-do-i-insert-a-tab-character-in-iterm

ADD REPLY

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6