removing decimal from ENSEMBL gene ID from deseq2 output
1
1
Entering edit mode
7.2 years ago
1769mkc ★ 1.2k

I want to remove the decimal from the ensembl gene ID ,since it contains the decimal point it becomes difficult when i try to map the same to gene name .

gene                   "nH1.bam"    "nH2.bam"   "nH3.bam"              "nH4.bam"
"ENSG00000238164.4" -0.6534833425   -0.6404869759   -0.5898568965   -0.586357257
"ENSG00000049249.6" 1.0589150487    0.2235087421    0.5028436068    0.5201173416

I want this in my gene field "ENSG00000049249" instead of this "ENSG00000049249.6"

I tried this awk '{gsub(/\..*$/,$1)}1' it seems it messing up the data frame im not sure what im doing wrong.

Any help or suggestion would be highly appreciated

R ensembl • 5.8k views
ADD COMMENT
0
Entering edit mode

How would you alter the command if there are two digits after the decimal point, say ENSG00000000460.15 ? I am not able to remove the numbers after the decimal point in such cases using the above command.

ADD REPLY
8
Entering edit mode
7.2 years ago
sed 's/\(ENSG[0-9]*\)\.[0-9]*/\1/g' input.txt
ADD COMMENT
0
Entering edit mode

thank you very much

ADD REPLY
0
Entering edit mode

Hi Pierre, Your solution worked very well, but do you mind explaining the RE?

For example, I am not sure where the substitution to blank space instead of the version number is taking place? I understand that "\1" reverts the found RE to output and that \g is global... but where exactly is the substitution?

Thanks.

ADD REPLY
0
Entering edit mode

How would you alter the command if there are two digits after the decimal point, say ENSG00000000460.15 ? I am not able to remove the numbers after the decimal point in such cases using the above command.

ADD REPLY

Login before adding your answer.

Traffic: 1199 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6