Remove multiple gene id from bed file
1
0
Entering edit mode
19 months ago
anii • 0

Hi all,

I have downloaded the bed file from exome analysis from the agilent site. The S33266340_Covered.bed contains multiple entries of gene id (in 4th column) in the same rows. I want only the gene name. what would be the way to remove all entries except gene name in the 4th column?

This is the bed file

browser position chr1:14695-14814
track name="Covered" description="Agilent SureSelect DNA - SureSelectXT HS Human All Exon V8 - Genomic regions covered by probes" color=0,0,128 db=hg19
chr1    14694   14814   ref|WASH7P,ref|NR_024540,ens|ENST00000438504,ens|ENST00000538476,ens|ENST00000488147,ens|ENST00000541675,ens|ENST00000423562
chr1    14928   15048   ref|WASH7P,ref|NR_024540,ens|ENST00000538476,ens|ENST00000438504,ens|ENST00000488147,ens|ENST00000541675,ens|ENST00000423562
chr1    15752   15948   ref|WASH7P,ref|NR_024540,ens|ENST00000538476,ens|ENST00000438504,ens|ENST00000488147,ens|ENST00000541675,ens|ENST00000423562
chr1    16603   17068   ref|WASH7P,ref|NR_024540,ens|ENST00000438504,ens|ENST00000538476,ens|ENST00000488147,ens|ENST00000541675,ens|ENST00000423562
chr1    17235   17421   ref|WASH7P,ref|NR_024540,ens|ENST00000488147,ens|ENST00000438504,ens|ENST00000538476,ens|ENST00000541675,ens|ENST00000423562,miRNA|hsa-miR-6859-
3p
transcript bed genes • 780 views
ADD COMMENT
2
Entering edit mode
19 months ago
iraun 6.2k

Assuming that the gene name in the 4th field is the first field before the ",", then you could use:

awk -F'\t' 'BEGIN {OFS = FS}/^chr/{split($4,a,","); {print $1, $2, $3, a[1]}}' file.bed
ADD COMMENT
0
Entering edit mode

it works Thankyou

ADD REPLY

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6