Entering edit mode
5.6 years ago
Kumar
▴
170
I have two tab delimited large files containing two columns, IDs and seqs. I am looking to generate output file of mismatch lines which are not available in each other files.
Please see the example below:
File 1:
0 AAAAGTGTGTAAAGAAGGGTAAAAAAAAAAACCGGATGCGAGGCATCCGGT
1000004 TACCGGGGAGTCGCCTTTTGCAACAGCACGGCTCAG
1000001 TGGTCAGTTTATGGAACGTTACCGGGGAGTTACTTTTTGCAACAGCACGGCTCAGCGC
1000002 ACCGGGGCAACAGCACTGCGACCGCTAAAAAAG
1000003 ATCACCGGGGCAGGCATTCGCCAGCGCCAGTAGCTGG
File 2:
1000000 TTTTTACCGGGGAGTCGCCTTTTGCAACAGCGGACGGCTCAG
1000008 TACCGGGGAGTCGCCTTTTGCAACAGCACGGCTCAG
1000006 TGGTCAGTTTATGGAACGTTACCGGGGAGTTACTTTTTGCAACAGCACGGCTCAGCGC
1000005 ACCGGGGCAACAGCACTGCGACCGCTAAAAAAG
1000009 ATCACCGGGGCAGGCATTCGCCAGCGCCAGTAGCTGG
OUTPUT:
0 AAAAGTGTGTAAAGAAGGGTAAAAAAAAAAACCGGATGCGAGGCATCCGGT
1000000 TTTTTACCGGGGAGTCGCCTTTTGCAACAGCGGACGGCTCAG
I don't follow: the description of the problem states you want to find lines that are different between the files, correct? However, there are no lines in common between the two example files, so all lines should be included in the output. Or did I get something wrong?
I updated my query. I am looking to generate a file of different lines between files.
Many possibilities with
AWK
orGREP
, see here for same example and solutions.