Hello, I have a file containing gene names of interest (24423 genes), and another file containing the lengths to all the genes (41306 genes). I want the lengths only to the 24424 genes, but when I grep
using grep -wf file1 file2
or even fgrep -Fwf file1 file2
, I get some excess genes, because some genes in my list may contain only the sense or the anti-sense strands, whereas if the reference file may contain both, and that is being reflected.
I want to know if there is a way to remove from the reference file (file2) all the lines that don't match?
Thank you.
P.S. The question is also on stackoverflow.com
edit -
file1
A1BG
A1BG-AS1
TSPAN6
MYB
MYB-AS1
file2
A1BG 2941
A1BG-AS1 560
TSPAN6 7923
MYB-AS1 362
MYB-AS2 713
MYB-AS3 396
desired_output
A1BG 2941
A1BG-AS1 560
TSPAN6 7923
MYB-AS1 362
But I always get MYB-AS2 and MYB-AS3
and you'll soon get some negative votes on stackoverflow because you don't show any sample of your files.
Hi, can you post example of your file1, file2 and desire output?