Entering edit mode
17 months ago
BioinfoBee
•
0
Hello, Curious to know on how to sort the gff3 file according to its chromosome while keeping its parent (gene) and child features (mRNA, cds and exon) intact:
input example:
Chr6 EVM gene 212579245 212580018 . + . ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6 EVM mRNA 212579245 212580018 . + . ID=evm.model.Chr6.3631;Parent=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6 EVM exon 212579245 212580018 . + . ID=evm.model.Chr6.3631.exon1;Parent=evm.model.Chr6.3631
Chr6 EVM CDS 212579245 212580018 . + 0 ID=cds.evm.model.Chr6.3631;Parent=evm.model.Chr6.3631
Chr5 EVM gene 240103107 240104618 . + . ID=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5 EVM mRNA 240103107 240104618 . + . ID=evm.model.Chr5.3135;Parent=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5 EVM exon 240103107 240104618 . + . ID=evm.model.Chr5.3135.exon1;Parent=evm.model.Chr5.3135
Chr5 EVM CDS 240103107 240104618 . + 0 ID=cds.evm.model.Chr5.3135;Parent=evm.model.Chr5.3135
Chr3 EVM gene 3535391 3537315 . - . ID=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3 EVM mRNA 3535391 3537315 . - . ID=evm.model.Chr3.57;Parent=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3 EVM exon 3535391 3535825 . - . ID=evm.model.Chr3.57.exon3;Parent=evm.model.Chr3.57
Chr3 EVM exon 3535934 3536077 . - . ID=evm.model.Chr3.57.exon2;Parent=evm.model.Chr3.57
Chr3 EVM exon 3536230 3537315 . - . ID=evm.model.Chr3.57.exon1;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3535391 3535825 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3535934 3536077 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3536230 3537315 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
expected output example:
Chr3 EVM gene 3535391 3537315 . - . ID=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3 EVM mRNA 3535391 3537315 . - . ID=evm.model.Chr3.57;Parent=evm.TU.Chr3.57;Name=EVM prediction Chr3.57
Chr3 EVM exon 3535391 3535825 . - . ID=evm.model.Chr3.57.exon3;Parent=evm.model.Chr3.57
Chr3 EVM exon 3535934 3536077 . - . ID=evm.model.Chr3.57.exon2;Parent=evm.model.Chr3.57
Chr3 EVM exon 3536230 3537315 . - . ID=evm.model.Chr3.57.exon1;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3535391 3535825 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3535934 3536077 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr3 EVM CDS 3536230 3537315 . - 0 ID=cds.evm.model.Chr3.57;Parent=evm.model.Chr3.57
Chr5 EVM gene 240103107 240104618 . + . ID=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5 EVM mRNA 240103107 240104618 . + . ID=evm.model.Chr5.3135;Parent=evm.TU.Chr5.3135;Name=EVM prediction Chr5.3135
Chr5 EVM exon 240103107 240104618 . + . ID=evm.model.Chr5.3135.exon1;Parent=evm.model.Chr5.3135
Chr5 EVM CDS 240103107 240104618 . + 0 ID=cds.evm.model.Chr5.3135;Parent=evm.model.Chr5.3135
Chr6 EVM gene 212579245 212580018 . + . ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6 EVM mRNA 212579245 212580018 . + . ID=evm.model.Chr6.3631;Parent=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631
Chr6 EVM exon 212579245 212580018 . + . ID=evm.model.Chr6.3631.exon1;Parent=evm.model.Chr6.3631
Chr6 EVM CDS 212579245 212580018 . + 0 ID=cds.evm.model.Chr6.3631;Parent=evm.model.Chr6.3631
Regards, B
uh ? what about the sort command ?
it does sort but not able to keep the child features (mRNA, CDS, exon) in proper order. For example, I can sort using: sort -k1,1 -k4,4n -k5,5n input.gff3 > output_sorted.gff3 but the output would look something like below:
You're sorting by chromosome AND co-ordinates. Try just
sort -k1,1
using sort -k1,1 doesn't keep the order of child features of each gene in output.
Those are your options with the
sort
utility - you can either keep the existing order or re-order by coordinate. If you're looking for a way to address any possible sorting problems, try gff3sort. You could use--precise
to get to the recommended way entries are supposed to be sorted. If that works, I''ll add this as an answer and you can accept it.