Entering edit mode
3 months ago
blumina.r
•
0
Hello,
I want to use the annotation tool TOGA for my research, and one of the inputs it needs is a .bed file with the annotation of a reference genome, and all its CDSs must be divisible by 3. I already have the .bed file, which I obtained by converting a gff file to bed12 using AGAT.
How can I check if all CDSs in my file are divisible by 3? And if some of them are not, how could I fix that?
This are the first 5 lines of my .bed file:
ContigUN 14393 67988 SAGMID_R016814 0.02 + 14393 67988 255,0,0 8 136,120,87,87,87,53,192,750 0,3911,11188,13427,27578,33791,42499,52845
ContigUN 20826 91148 SAGMID_R016815 0 - 20826 91148 255,0,0 2 138,480 0,69842
ContigUN 768722 773351 SAGMID_R017391 10.78 + 768722 773351 255,0,0 2 414,477 0,4152
ContigUN 808514 825537 SAGMID_R017392 15.50 - 808514 825537 255,0,0 2 626,682 0,16341
ContigUN 1279233 1293463 SAGMID_R019034 0.02 + 1279233 1293463 255,0,0 6 102,84,154,165,63,110 0,415,7381,10596,12350,14120
I found this smart solution but for fasta files. Anyone knows how could I do this but for my .bed file so that it can be used as input for TOGA?
Thank you in advance.
It might be easier to check the gff file before you convert it.
Thank you for your answer! And how could I check all CDS length in the gff file? The first lines look like this:
Basically for each CDS, sum the lengths of each exon, and check if it divides by three. Are you comformatble in any programing lanuage?
Not yet, but I just asked ChatGPT and I think I have what I needed: a bash script that does just that, and corrects those lines not divided by three:
I am testing it at the moment. Thank you very much!
This is a bit of an overly simplistic approach that is typical for a AI generated code. Yes, it can check if the length of each line is divisible by 3 but that is not the point. From the GFF file, you would have to merge all CDS stretches into one first combined length first.
However, the case may be further complicated by the phase. So I would try check using this script: https://agat.readthedocs.io/en/latest/tools/agat_sp_fix_cds_phases.html