I have two tab separated values file, say
File1.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 CC
chr3 104972 rs990284 AA #<--- Unique Line
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 GG
chr5 303686 rs6896163 AA #<--- Unique Line
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 GG
chrY 2897433 rs9786543 GG
chrM 57 i3002191 TT
File2.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 AT
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 CC
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 GA
chrY 2897433 rs9786543 GG
chrM 57 i3002191 TA
Desired Output:
Output.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 AT
chr3 104972 rs990284 AA #<--Unique Line from File1.txt
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 CC
chr5 303686 rs6896163 AA #<--Unique Line from File1.txt
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 GA
chrY 2897433 rs9786543 GG
chrM 57 i3002191 TA
File1.txt has total 10 entries while File2.txt has 8 entries. I want to compare the both the file using Column 1 and Column 2. (or we can also use column 3 rsid)
If both the file's first two column values are same, it should print the corresponding line to Output.txt from File2.txt.
When File1.txt has unique combination (Column1:column2, which is not present in File2.txt) it should print the corresponding line from File1.txt to the Output.txt.
I tried various awk and perl combination available at website, but couldn't get correct answer. Any suggestion will be helpful.
Thanks,
Amit
Hi Sean,
This is working for the small text file but running continuously for large file (~1million lines in both files) and throwing no result.
Also, I it is only providing the uniq line from the File1.txt while I want to keep Match positions line from File2.txt as well in output file.
Thanks, Amit
Since your files appear to be genomic coordinates, you may want to convert them to a "standard" format such as BED and then apply tools like bedtools or bedops. This will offer you the performance that you want on very large files.
Thanks Sean for the suggestions~~!! I tried some awk combination and able to get the output.
Did you want to answer your own question, then, so that we can see what you came up with?