Question

How modified GTF file into a BED file with chromosome aware format?

0

Entering edit mode

5 days ago

mauricio.1313 • 0

Hello BioStars Community,

I’m currently working with GTF files downloaded directly from NCBI RefSeq for well-studied species like dog (Canis lupus familiaris) and ferret (Mustela putorius furo). When I convert these GTF files to BED format, I encounter an issue with scaffold names not being mapped to standard chromosome names (e.g., chr11), for ferret, you can expect not the chromosome name, because is still not available (the complete information).

Here’s an example of the GTF file I’m working with:

#!annotation-source NCBI Mustela putorius furo Annotation Release 102
NW_025421256.1   Gnomon   gene   5221   29828   .   +   .   gene_id "LOC123387000"; transcript_id ""; db_xref "GeneID:123387000"; gbkey "Gene"; gene "LOC123387000"; gene_biotype "lncRNA";

However, after transforming it to a bed file using tools like gff2bed, I got this:

NC_020638.1 0   69  .   .   +   RefSeq  exon    .   gene_id "unassigned_gene_1"; transcript_id "unassigned_transcript_610"; product "tRNA-Phe"; transcript_biotype "tRNA"; exon_number "1";

When my expected output is something like this:

0   NM_001291928.1  chr1    -   134199214   134234856   134202950   134234733   2   134199214,134234662,    134203590,134234856,    0Adora1 cmpl    cmpl    2,0,

Please, notice that the three chunks of code are not necessarily related, so the ID do not match the species that I'm asking for, this is only to put in the post what I need.

Taking this information together, I know that is possible to obtain the bed files in the desired format, some online tools like USCS table browser provided this. For dog and ferret, the version that they provided is not the one that I'm working on, so is not an option for me.

Does anyone know about any accurate way to perform the task that I need here?

gtf refseq bed • 521 views

ADD COMMENT • link updated 1 day ago by Juke34 8.9k • written 5 days ago by mauricio.1313 • 0

0

Entering edit mode

Juke34 may be able to provide some insight.

ADD REPLY • link 5 days ago by GenoMax 148k

0

Entering edit mode

Looking forward for this! Thank you for the comment

ADD REPLY • link 4 days ago by mauricio.1313 • 0

1

Entering edit mode

Have you seen this: https://agat.readthedocs.io/en/latest/gff_to_bed.html

ADD REPLY • link 4 days ago by GenoMax 148k

0

Entering edit mode

I tried with bedpods, however I think that PASA I did not, I will check if can get the desired output

ADD REPLY • link 4 days ago by mauricio.1313 • 0

0

Entering edit mode

Juke34 Do you have any advice? Any comment is greatly appreciated

ADD REPLY • link 1 day ago by mauricio.1313 • 0

0

Entering edit mode

When you do gtf/gff conversion to bed there is no mapping stuff... Sequence identifier from first column of the gtf/gff must be reported in the first column of the bed file. You can observe that in the mini review I made here https://agat.readthedocs.io/en/latest/gff_to_bed.html

So I don’t get what is your issue about scaffold name.

Anyway if you read the information at the link provided you can see that the conversion using bedops (gff2bed) is quite particular. Only the first 6 columns are as expected for bed file.

If you want to stick at the correct bed output you should probably prefer AGAT

ADD REPLY • link 1 day ago by Juke34 8.9k

0

Entering edit mode

Hi,

I don't see the problem here. If the ferret genome is not full-fledged, so use it as is. Your 'expected output' is not a bed file but a result of the UCSC's Table Browser output. The result you show in the middle is a correct bed file. For more information on the bed file format look e.g. here.

ADD REPLY • link 5 days ago by michael.ante ★ 3.9k

0

Entering edit mode

Thank you for your quick reply. You are right about the ferret, that I should expect that output because there is no chromosome number available, because of the fact that it is not full-fledged. However, for dog, I expect that using the traditional tool to generate bed files from a gtf file, for example. But I can not get the desired output. I will say that probably my expected output is a pseudo bed file, you can check in the following link, that is the expected output for me with more details, TOGA-bed-output. Any comments are welcome! Thank you again

ADD REPLY • link 4 days ago by mauricio.1313 • 0