Question

How to extract de novo transcript from RNA long read alignement ?

0

Entering edit mode

3.8 years ago

sacha ★ 2.4k

Hi,

I have long reads from transcripts sequencing ( one amplicon sequenced wtih PACBIO ) that I mapped to a gene locus using minimap2. I would like to extract transcript structure ( in GTF format ? ) with their abundance. For example, in the screenshot bellow, you can see an alignment showing 2 kind of transcripts. One, with an intronic retention. I would like to get the structure of those transcripts and the amount .

Pacbio alignement

RNA pacbio • 1.1k views

ADD COMMENT • link updated 3.8 years ago by Juke34 9.0k • written 3.8 years ago by sacha ★ 2.4k

score 0 · Answer 1 · 2021-04-09

0

Entering edit mode

3.8 years ago

Juke34 9.0k

If your minimap2 output is a bam format you can use agat_convert_minimap2_bam2gff.pl from AGAT to convert the data into GFF and then you can extract the sequences using agat_sp_extract_sequences.pl. You may also convert the GFF into GTF using agat_convert_sp_gff2gtf.pl

ADD COMMENT • link 3.8 years ago by Juke34 9.0k

0

Entering edit mode

From a GFF file, can I remove duplicate using AGAT ? And get the count of each item ?

ADD REPLY • link 3.8 years ago by sacha ★ 2.4k

1

Entering edit mode

agat_convert_minimap2_bam2gff.pl will not remove the duplicates but the other scripts (with _sp_ in their name) will remove duplicates automatically (when parsing the file). If you need a close look at the removed duplicates you can run agat_convert_sp_gxf2gxf.pl that will generate a log file.

ADD REPLY • link 3.8 years ago by Juke34 9.0k