Convert cds .fa to bed
1
0
Entering edit mode
7.6 years ago
bioinfo8 ▴ 230

Hi,

I have downloaded cds file for a genome in .fa format from Ensembl (file name: organism.cds.all.fa.gz). However, I need cds in bed format. I am unable to find it on UCSC also. Any guidance would be appreciated.

Thanks!

ensembl bed fasta cds ucsc • 3.2k views
ADD COMMENT
3
Entering edit mode
7.6 years ago
Sej Modha 5.3k

Instead of starting with a fasta file easiest thing to do is to download a gtf file from Ensembl and convert it to bed. Converting gtf format to bed format

ADD COMMENT
0
Entering edit mode

Thanks, but gtf is available for genome and not for cds.

ADD REPLY
2
Entering edit mode

You can filter a gtf to only select records that are CDS.

ADD REPLY
0
Entering edit mode

Would you please provide more directions in this regard as I am new in the field? Thanks.

ADD REPLY
1
Entering edit mode

Sej Modha your approach is good. After downloading the gtf. If you are using a Linux system to Grep records that are CDS.

grep "CDS" input.gtf > output.gtf

Then Convert gtf to bed

ADD REPLY
1
Entering edit mode

You might wanna do the filtering based on a column (I think it is column 3) to ensure that you subset the file properly.

ADD REPLY
0
Entering edit mode

Yes, CDS is the column 3 and grep worked. :)

ADD REPLY
0
Entering edit mode

Ensembl gtf has 1-based coordinate system while bed has 0-based, so the following won't be enough to create bed file?

awk 'BEGIN {OFS="\t"} {print $1,$4,$5}' org_cds.gtf  > org_cds_3cols.bed

Corrected one by substracting 1 from column 4 to convert 1-based to 0-based system (How To Convert Gencode Gtf Into Bed Format ?)

 awk 'BEGIN {OFS="\t"} {print $1,$4-1,$5}' org_cds.gtf  > org_cds_3cols.bed
ADD REPLY

Login before adding your answer.

Traffic: 1350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6