GENCODE v.27 BED 12 file
1
1
Entering edit mode
7.0 years ago
samuel ▴ 260

Hi, Does anyone know where you can get a BED 12 format file of the latest version of GENCODE? I have tried this link which didn't work: https://github.com/stevekm/reference-annotations

I need the bed 12 format so this doesn't work either:

cat gencode.v27.gtf | awk 'BEGIN{FS="\t";OFS="\t"}{split($9,a,";");print $1,$4-1,$5,a[1],".",$7}' | sed 's/gene_id //g' | tr -d '"' > file.bed

Many thanks.

genome next-gen sequence • 2.3k views
ADD COMMENT
0
Entering edit mode

You can make a BED12 file extending your command above. Problem is the fields downstream are fixed values derived from the BED6. What are you planning to use this BED12 file for?

cat gencode.v27.gtf | awk 'BEGIN{FS="\t";OFS="\t"}{if($1 ~ /^ch/){split($9,a,";");print $1,$4-1,$5,a[1],"0",$7,$4-1,$4-1,"255,0,0",1,$5-$4,$4-1}else{print $0}}' | sed 's/gene_id //g' | tr -d '"' > file.bed
ADD REPLY
0
Entering edit mode
6.9 years ago
michael.ante ★ 3.9k

Hi zoegward,

I used once the gtf2bed.pl program from the ea-utils. As far as I remember, it creates for each transcript a line in the resulting bed file. Thereby, it describes the transcripts' exon-usage, while loosing the transcript-gene association.

On the other hand, a GTF file implements the gene - transcript - exon relationships in an hierarchical way. This leads to a lot of redundant information.

Cheers,

Michael

ADD COMMENT

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6