Error from cgat gtf2gtf --method=genes-to-unique-chunks
1
0
Entering edit mode
8 months ago
PK ▴ 130

Hi,

I installed the cgat tool via mamba. I want to create the unique chunks at gene level using

--method=genes-to-unique-chunks

so i copied the code from one of the biostars code.

cgat gtf2gtf --method=genes-to-unique-chunks -I /reference_annotation/GRCh38.86.gtf > output.gtf

But i'm getting this error

self.transcript_id = other.transcript_id File "pysam/libctabixproxies.pyx", line 638, in pysam.libctabixproxies.GTFProxy.__getattr__ KeyError: 'transcript_id'

I supplied the entire GTF file without any filtering. Do i have to filter the specifically before i supply to the cgat?

RNA-Seq GTF cgat • 446 views
ADD COMMENT
2
Entering edit mode
8 months ago

Yes, cgat assumes that the input GTF is valid according to the following specification:

http://mblab.wustl.edu/GTF22.html

This means that the transcript_id field is a manditory attribute.

However, modern ensembl GTFs contain gene lines that do not have a transcript_id. The easiest way to filter these out would be either:

awk '$3!="gene" '  /reference_annotation/GRCh38.86.gtf  | cgat gtf2gtf --method=genes-to-unique-chunks -L output.log > output.gtf

or

grep "transcript_id" /reference_annotation/GRCh38.86.gtf  | cgat gtf2gtf --method=genes-to-unique-chunks -L output.log > output.gtf
ADD COMMENT
0
Entering edit mode

Thanks for your reply. It's working now

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6