Hello, I got a file with GOrilla results. I don;t have the input data used to create the file. This is the file:
GO Term Description P-value FDR q-value Enrichment N B n b Genes
GO:0001580 detection of chemical stimulus involved in sensory perception of bitter taste 4.40E-20 6.67E-16 10.14 20504 48 1053 25 [Tas2r105 - taste receptor, type 2, member 105, Tas2r119 - taste receptor, type 2, member 119, Tas2r136 - taste receptor, type 2, member 136, Tas2r122 - taste receptor, type 2, member 122, Tas2r117 - taste receptor, type 2, member 117, Tas2r123 - taste receptor, type 2, member 123, Tas2r115 - taste receptor, type 2, member 115, Tas2r129 - taste receptor, type 2, member 129, Tas2r130 - taste receptor, type 2, member 130, Tas2r125 - taste receptor, type 2, member 125, Tas2r121 - taste receptor, type 2, member 121, Tas2r124 - taste receptor, type 2, member 124, Tas2r120 - taste receptor, type 2, member 120, Tas2r113 - taste receptor, type 2, member 113, Tas2r114 - taste receptor, type 2, member 114, Tas2r109 - taste receptor, type 2, member 109, Tas2r110 - taste receptor, type 2, member 110, Tas2r106 - taste receptor, type 2, member 106, Tas2r107 - taste receptor, type 2, member 107, Tas2r102 - taste receptor, type 2, member 102, Tas2r116 - taste receptor, type 2, member 116, Tas2r104 - taste receptor, type 2, member 104, Tas2r103 - taste receptor, type 2, member 103, Tas2r131 - taste receptor, type 2, member 131, Tas2r140 - taste receptor, type 2, member 140]
I want to get a list with genes only by extract from the brackets in the 11th column the genes name. For example:
Tas2r105
Tas2r119
Tas2r117
...
I tried the code :
awk -F'[][]' '{print $2}' gorilla_master_nox.txt | grep -oP '(?<=,).*?(?=-)'
But I dont get the wanted results. I would appreciate any help.
Thank you
If gene nomenclature pattern is fixed, you can use:
If you want to tighten the expression, you can use
[A-Z][a-z]{2}[0-9][a-z][0-9]{3}
. However, if you are not sure of gene pattern, please usecut -f11 test.txt| grep -Po '[A-Z][a-z]{2}[0-9][a-z][0-9]{3}'
Thanks! The genes nomenclature pattern is not fixed, but the
cut -f11
didn't work.