Hi,
I have long reads from transcripts sequencing ( one amplicon sequenced wtih PACBIO ) that I mapped to a gene locus using minimap2. I would like to extract transcript structure ( in GTF format ? ) with their abundance. For example, in the screenshot bellow, you can see an alignment showing 2 kind of transcripts. One, with an intronic retention. I would like to get the structure of those transcripts and the amount .
From a GFF file, can I remove duplicate using AGAT ? And get the count of each item ?
agat_convert_minimap2_bam2gff.pl
will not remove the duplicates but the other scripts (with_sp_
in their name) will remove duplicates automatically (when parsing the file). If you need a close look at the removed duplicates you can runagat_convert_sp_gxf2gxf.pl
that will generate a log file.