Issues with T2T in pigeon prepare from isoseq
0
0
Entering edit mode
9 weeks ago
SethJ • 0

I want to run some data through the isoseq pipeline using the T2T genome.

I downloaded the genomes from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/. Specifically I downloaded

  • GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf
  • GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

I tried running pigeon prepare using the command

pigeon prepare GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf GCF_009914755.1_T2T-CHM13v2.0_genomic.fna

It gave me the error

| 20240918 15:52:07.087 | FATAL | pigeon prepare ERROR: GFF/GTF file error, improperly formatted record reason : empty record ID record : NC_060925.1 BestRefSeq gene 7506 138480 . - . gene_id "LOC127239154"; transcript_id ""; db_xref "GeneID:127239154"; description "uncharacterized LOC127239154"; gbkey "Gene"; gene "LOC127239154"; gene_biotype "lncRNA"; partial "true"; See format documentation at https://isoseq.how

I thought that the issue might be the gene name was listed as gene rather than gene_name so I tried changing that using awk

awk '{gsub(/"; gene "/,"; gene_id "); print}' GCF_009914755.1_T2T-CHM13v2.0_genomic.gtf > GCF_009914755.1_T2T-CHM13v2.0_genomic_edited.gtf

I ran pigeon on the resulting file and still got functionally the same error (only with the gene changed to gene_name). I don't know what else is wrong with the format. I don't know what it means by the record id. Has anyone seen this before?

isoseq long rna-seq read • 179 views
ADD COMMENT

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6