Issues with T2T in pigeon prepare from isoseq
0
0
Entering edit mode
6 months ago
SethJ • 0

I want to run some data through the isoseq pipeline using the T2T genome.

I downloaded the genomes from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/. Specifically I downloaded

  • GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf
  • GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

I tried running pigeon prepare using the command

pigeon prepare GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

It gave me the error

>| 20240918 15:52:07.087 | FATAL | pigeon prepare ERROR: GFF/GTF file error, improperly formatted record
  reason : empty record ID
  record : NC_060925.1  BestRefSeq  gene    7506    138480  .   -   .   gene_id "LOC127239154"; transcript_id ""; db_xref "GeneID:127239154"; description "uncharacterized LOC127239154"; gbkey "Gene"; gene "LOC127239154"; gene_biotype "lncRNA"; partial "true";
See format documentation at https://isoseq.how

I thought that the issue might be the gene name was listed as gene rather than gene_name so I tried changing that using awk

awk '{gsub(/"; gene "/,"; gene_id "); print}' GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf > GCF_009914755.1_T2T-CHM13v2.0_genomic_edited.gtf

I ran pigeon on the resulting file and still got functionally the same error (only with the gene changed to gene_name). I don't know what else is wrong with the format. I don't know what it means by the record id. Has anyone seen this before?

long-read isoseq rna-seq • 535 views
ADD COMMENT
0
Entering edit mode

Got the same error using T2T. I will wait to see if there's a quick fix.

ADD REPLY

Login before adding your answer.

Traffic: 2012 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6