How modified GTF file into a BED file with chromosome aware format?
0
0
Entering edit mode
5 days ago

Hello BioStars Community,

I’m currently working with GTF files downloaded directly from NCBI RefSeq for well-studied species like dog (Canis lupus familiaris) and ferret (Mustela putorius furo). When I convert these GTF files to BED format, I encounter an issue with scaffold names not being mapped to standard chromosome names (e.g., chr11), for ferret, you can expect not the chromosome name, because is still not available (the complete information).

Here’s an example of the GTF file I’m working with:

#!annotation-source NCBI Mustela putorius furo Annotation Release 102
NW_025421256.1   Gnomon   gene   5221   29828   .   +   .   gene_id "LOC123387000"; transcript_id ""; db_xref "GeneID:123387000"; gbkey "Gene"; gene "LOC123387000"; gene_biotype "lncRNA";

However, after transforming it to a bed file using tools like gff2bed, I got this:

NC_020638.1 0   69  .   .   +   RefSeq  exon    .   gene_id "unassigned_gene_1"; transcript_id "unassigned_transcript_610"; product "tRNA-Phe"; transcript_biotype "tRNA"; exon_number "1";

When my expected output is something like this:

0   NM_001291928.1  chr1    -   134199214   134234856   134202950   134234733   2   134199214,134234662,    134203590,134234856,    0Adora1 cmpl    cmpl    2,0,

Please, notice that the three chunks of code are not necessarily related, so the ID do not match the species that I'm asking for, this is only to put in the post what I need.

Taking this information together, I know that is possible to obtain the bed files in the desired format, some online tools like USCS table browser provided this. For dog and ferret, the version that they provided is not the one that I'm working on, so is not an option for me.

Does anyone know about any accurate way to perform the task that I need here?

gtf refseq bed • 521 views
ADD COMMENT
0
Entering edit mode

Juke34 may be able to provide some insight.

ADD REPLY
0
Entering edit mode

Looking forward for this! Thank you for the comment

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

I tried with bedpods, however I think that PASA I did not, I will check if can get the desired output

ADD REPLY
0
Entering edit mode

Juke34 Do you have any advice? Any comment is greatly appreciated

ADD REPLY
0
Entering edit mode

When you do gtf/gff conversion to bed there is no mapping stuff... Sequence identifier from first column of the gtf/gff must be reported in the first column of the bed file. You can observe that in the mini review I made here https://agat.readthedocs.io/en/latest/gff_to_bed.html

So I don’t get what is your issue about scaffold name.

Anyway if you read the information at the link provided you can see that the conversion using bedops (gff2bed) is quite particular. Only the first 6 columns are as expected for bed file.

If you want to stick at the correct bed output you should probably prefer AGAT

ADD REPLY
0
Entering edit mode

Hi,

I don't see the problem here. If the ferret genome is not full-fledged, so use it as is. Your 'expected output' is not a bed file but a result of the UCSC's Table Browser output. The result you show in the middle is a correct bed file. For more information on the bed file format look e.g. here.

ADD REPLY
0
Entering edit mode

Thank you for your quick reply. You are right about the ferret, that I should expect that output because there is no chromosome number available, because of the fact that it is not full-fledged. However, for dog, I expect that using the traditional tool to generate bed files from a gtf file, for example. But I can not get the desired output. I will say that probably my expected output is a pseudo bed file, you can check in the following link, that is the expected output for me with more details, TOGA-bed-output. Any comments are welcome! Thank you again

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6