I have two files with several hundred entries in each. File 1 has several 5 base seqeunces and file 2 has higher number of entries but with longer sequences. The first 5 bases of sequences in file 2 matches that of file 1. I tried some grep and awk methods , but did not work out for a partial match case as above. So for example:
File 1:
ATGCC
TTGCA
GGAAC
........
........
File 2:
ATTTCGGGAAAATT
ATGCCTTAAGACCT
GGAACTAAGGGGA
............
............
Expected outcome:
ATGCCTTAAGACCT
GGAACTAAGGGGA
Any help is much appreciated ! Thanks !
Shenwei, thanks for the reply. But I already tried that grep option before posting the topic. It didn't work.
It definitely will work, but you have to put
^
in front of the 5 letter sequences inFile1
...If you don't want to use grep then any program that will separate based on user-defined barcodes - flexbar / etc - will do this for you.