Hi guys,
While editing a gff3 file (genome annotation) with a custom script, I need to split the file by strand, obtaining a "forward.gff" and a "reverse.gff". For downstream analysis it's required, after editing, to merge these two into a single one.
I noticed that the gff3 format is sorted by chromosome's number (1st column) and start position (4th column) of a gene, with associated records (mRNA, exon, etc..) following. Now it's clear that in order to merge these two files and to sort them I couldn't just launch a sort
command by 1st and 4th field, because start coordinates are indipendent between strands.
So here's the question: if I would just merge the two files without caring of start coordinates, how much will impact onto downstream analysis (reads alignment and gene counting)? Does the sorting order of a gff file truly matters?
Additional question: if someone knows a fast way to sort them back like the previous file, please write it down.
AGAT toolkit contains many GFF file related tools. Check to see if you find something usable. @Juke (author) participates on Biostars and will likely notice this question too.
^^ you were right, here I am