Question

RNA-SeQC v1.1.8 transcript_id attribute was not found

0

Entering edit mode

5.8 years ago

shintzen ▴ 30

Hi, I am running RNA-SeQC on some files with Ensembl. I know Ensembl format is "malformed" to this tool but I have followed the args to try to provide a different type of id and I still am not making progress.

java -jar RNA-SeQC_v1.1.8.jar -ttype "gene_biotype"  -r rna-seq/hg38_nochr.fa -t rna-seq/P001/Homo_sapiens.GRCh38.84.gtf -o RNASEQC_out -s rna-seq/P001/RNASeQC_file_P001.txt
RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
         contig names in reference: 455
Loading GTF for Read Counting
The required transcript_id attribute was not found on line 1    havana  gene    11869   14409   .       +       .       gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2";

I know all my files have the same chromosomes: Bams:

samtools view -H rna-seq/P001/Sample.HISAT2-2.1.0.aligned.sorted.bam                                                       @HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:248956422
@SQ     SN:10   LN:133797422
@SQ     SN:11   LN:135086622
@SQ     SN:12   LN:133275309
@SQ     SN:13   LN:114364328

GTF:

 grep -w "rRNA"  rna-seq/Homo_sapiens.GRCh38.84.gtf | head
1       ensembl gene    9437669 9437778 .       -       .       gene_id "ENSG00000252956"; gene_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA";
1       ensembl transcript      9437669 9437778 .       -       .       gene_id "ENSG00000252956"; gene_version "1"; transcript_id "ENST00000517147"; transcript_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP40-201"; transcript_source "ensembl"; transcript_biotype "rRNA"; tag "basic"; transcript_support_level "NA";

.fa

 head rna-seq/hg38_nochr.fa
>1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

I have checked my text file format as well:

Sample ID       Bam File        Notes
1_Sample_S1   /home/shintzen/rna-seq/P001/1_Sample.HISAT2-2.1.0.aligned.sorted.bam   1
2_Sample2_S2   /home/shintzen/rna-seq/P001/2_Sample2.HISAT2-2.1.0.aligned.sorted.bam   2

Am I missing something?

RNA-Seq QC • 2.2k views

ADD COMMENT • link updated 5.0 years ago by anoo ▴ 10 • written 5.8 years ago by shintzen ▴ 30

0

Entering edit mode

Hello, Did you solve the error given above? I am also facing the same error with RNA-SeQC.

Any help is appreciated.

ADD REPLY • link 5.0 years ago by anoo ▴ 10

0

Entering edit mode

Though I haven't used RNA-SeQC ever, by looking at the reported error The required transcript_id attribute was not found on line 1 .... it seems the tool is looking for the transcript_id attribute and your input .gtf file is having gene records at first line (probably that could be the master gene record).

I think excluding such gene records may help you in running this tool or I am sure there must be transcripts.gtf file available you can use that instead of gene.gtf.

ADD REPLY • link 5.0 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

The ENSEMBL gtf files provide a "gene" entry without a transcript_id. This is a violation of the GTF standards. You can remove each line having "gene" at position 3.

ADD REPLY • link 5.0 years ago by michael.ante ★ 3.9k