Hello! everyone:
I'm new to linux, here I got a problem:
I have a file file1 like:
3
6
7
9
12
and file2 which is tab-delimited:
chr1 3052600 3052800 1 E3
chr1 3052800 3053000 2 E3
chr1 3059400 3059600 3 E3
chr1 3059600 3059800 4 E3
chr1 3059800 3060000 5 E3
chr1 3062600 3062800 6 E3
chr1 3101000 3101200 7 E3
chr1 3105000 3105200 8 E3
chr1 3105200 3105400 9 E3
chr1 3116800 3117000 10 E2
chr1 3117000 3117200 11 E2
chr1 3164800 3165000 12 E2
and I want to extract the lines in file2 which its 4-th column equal the number in file1 like below:
chr1 3059400 3059600 3 E3
chr1 3062600 3062800 6 E3
chr1 3101000 3101200 7 E3
chr1 3105200 3105400 9 E3
chr1 3164800 3165000 12 E2
I have spent several hours including wrote a very slow python script, and I searched for the oneline solution, but I got nothing!
awk -v FS="\t" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' fiel1 file2
Thanks a lot for some suggestions!
Hello Hughie,
Please use appropriate tags. Your question is about formatting and awk. That should have been a tag when you created the question. When you add appropriate tags, users that follow the tag (usually experts interested in helping others in that subject matter) get notified of your question, and this means you stand a better chance at getting a relevant, useful response faster.
Thank you Wouter!
I have revised the tag
The appropriate tags that Wouter mentioned are
formatting
andawk
, not the one out-of-placeformatting and awk
tag.Thank you !
Revised again!