Hi, I have two files -
file_a
gene_name sample_1 sample_2 sample_3
gene_1 count_1count_1 count_1
gene_2 count_2 count_2 count_2
gene_5 count_3 count_3 count_3
gene_6 count_4 count_4 count_4
file_b
gene_name start end gene_length
gene_1 start_1 end_1 length_1
gene_2 start_2 end_2 length_2
gene_3 start_3 end_3 length_3
gene_4 start_4 end_4 length_4
gene_5 start_5 end_5 length_5
gene_6 start_6 end_6 length_6
gene_7 start_7 end_7 length_7
I want to get the gene lengths for all the genes present in file_a.
I tried using grep, but I don't think there is a way to grep column 1 of file_a with file_b
I also tried grep by taking only the first 2 columns, but it didn't work. Is there a simpler way?
Dear Charles,
Thanks for your answer. My data is tab-separated, but I shall try the suggestion and update you.
-edit-
I tried the suggestion, but
wc -l file_a
gives 24424, whilewc -l
of the output gives 23133 lines.file_b was downloaded from ucsc (known_gene, latest version). Could there be entries missing in it?
Thanks.
You can also try following command
awk '{print $1 }' file_a | grep --file - file_b | awk '{print $1"\t"$NF}'
Dear Suraj,
Thanks for the suggestion, but could you please explain what the command does, because
wc -l
gave me 36184, whilewc -l file_a
gives 24424.Dear vinayjrao, Can you please upload the part of file and share link with me.
Do you see the same numbers when following the very good answer of cpad0112 ?