You could probably modify this script for your input sets:
#!/usr/bin/env python
input_one = '''
BIEC2-99962 HOR_233 G_G
BIEC2-9997 HOR_233 A_G
BIEC2-999748 HOR_233 C_C
BIEC2-999848 HOR_233 G_G
BIEC2-99989 HOR_233 A_A
'''.strip().split()
input_two = '''
BIEC2-9997 HOR_250 A_A
BIEC2-999748 HOR_250 C_C
BIEC2-99989 HOR_250 A_C
'''.strip().split()
input_three = '''
BIEC2-9997 HOR_615 A_G
BIEC2-999748 HOR_615 A_C
BIEC2-999848 HOR_615 A_G
BIEC2-99989 HOR_615 A_C
'''.strip().split()
one_list = input_one[::3]
one_dict = dict(zip(one_list, input_one[2::3]))
two_dict = dict(zip(input_two[::3], input_two[2::3]))
three_dict = dict(zip(input_three[::3], input_three[2::3]))
print '\n'.join([' '.join([k, one_dict[k], two_dict.get(k, 'NA'), three_dict.get(k, 'NA')]) for k in one_list])
The output looks like:
$ ./join_test.py
BIEC2-99962 G_G NA NA
BIEC2-9997 A_G A_A A_G
BIEC2-999748 C_C C_C A_C
BIEC2-999848 G_G NA A_G
BIEC2-99989 A_A A_C A_C
This seems to match your expected output.
If you want to understand how the script works, use some print
statements for each variable before the list comprehension, and then break the list comprehension down into smaller pieces.
Ultimately, you would replace lists input_one
, input_two
and input_three
with the results from reading in your input files with open()
and readlines()
methods.
Remember to strip()
and split()
so that each element of the list is separated from the others, regardless of whether the delimiter is a space or newline - use print
to investigate one of the sample input lists, if this requirement isn't clear.
I'd second that awk
is not really the ideal tool for this job, and I use it a great deal.
You want to do it ONLY in
awk
or any script would be ok ?Hello sagi.polani!
It appears that your post has been cross-posted to another site: SEQanswers: http://seqanswers.com/forums/showthread.php?p=160574#post160574
This is typically not recommended as it runs the risk of annoying people in both communities.
And on SO... http://stackoverflow.com/questions/28601280/using-awk-to-perform-vlookup-like-command
While you can process multiple input files at once in awk, it's not something that comes highly recommended. I would really recommend writing a short python or perl script. That's inevitably easier to debug.
Hi Ryan,
Preferably awk...
Thanks!
What have you tried? The awk method for doing this is terribly inefficient and overly complicated. If this isn't something that you absolutely have to do, then take the hint and don't use awk for the task.
I prefer one-liners, but I'm open to suggestions.
Thanks
A one-liner solution would use join rather than awk.
Yes, but join requires me to sort the files, which I want to avoid doing.
Ah, I had assumed that you'd already done that. What have you tried so far with awk and what isn't yet working?
I went through numerous threads that I found, but non of them really worked. I'm not a real pro at this...
Define "not worked". It's unlikely that any of us will want to bother writing an awk-based solution for this. The most help you're likely to get is advise on a script that you've already started writing but isn't quite working.