Question

Issues with T2T in pigeon prepare from isoseq

0

Entering edit mode

6 months ago

SethJ • 0

I want to run some data through the isoseq pipeline using the T2T genome.

I downloaded the genomes from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/. Specifically I downloaded

GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf
GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

I tried running pigeon prepare using the command

pigeon prepare GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

It gave me the error

>| 20240918 15:52:07.087 | FATAL | pigeon prepare ERROR: GFF/GTF file error, improperly formatted record
  reason : empty record ID
  record : NC_060925.1  BestRefSeq  gene    7506    138480  .   -   .   gene_id "LOC127239154"; transcript_id ""; db_xref "GeneID:127239154"; description "uncharacterized LOC127239154"; gbkey "Gene"; gene "LOC127239154"; gene_biotype "lncRNA"; partial "true";
See format documentation at https://isoseq.how

I thought that the issue might be the gene name was listed as gene rather than gene_name so I tried changing that using awk

awk '{gsub(/"; gene "/,"; gene_id "); print}' GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf > GCF_009914755.1_T2T-CHM13v2.0_genomic_edited.gtf

I ran pigeon on the resulting file and still got functionally the same error (only with the gene changed to gene_name). I don't know what else is wrong with the format. I don't know what it means by the record id. Has anyone seen this before?

long-read isoseq rna-seq • 535 views

ADD COMMENT • link updated 1 day ago by Ram 45k • written 6 months ago by SethJ • 0

0

Entering edit mode

Got the same error using T2T. I will wait to see if there's a quick fix.

ADD REPLY • link 4 days ago by anbayega • 0