Entering edit mode
5.8 years ago
shintzen
▴
30
Hi, I am running RNA-SeQC on some files with Ensembl. I know Ensembl format is "malformed" to this tool but I have followed the args to try to provide a different type of id and I still am not making progress.
java -jar RNA-SeQC_v1.1.8.jar -ttype "gene_biotype" -r rna-seq/hg38_nochr.fa -t rna-seq/P001/Homo_sapiens.GRCh38.84.gtf -o RNASEQC_out -s rna-seq/P001/RNASeQC_file_P001.txt
RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
contig names in reference: 455
Loading GTF for Read Counting
The required transcript_id attribute was not found on line 1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2";
I know all my files have the same chromosomes: Bams:
samtools view -H rna-seq/P001/Sample.HISAT2-2.1.0.aligned.sorted.bam @HD VN:1.0 SO:coordinate
@SQ SN:1 LN:248956422
@SQ SN:10 LN:133797422
@SQ SN:11 LN:135086622
@SQ SN:12 LN:133275309
@SQ SN:13 LN:114364328
GTF:
grep -w "rRNA" rna-seq/Homo_sapiens.GRCh38.84.gtf | head
1 ensembl gene 9437669 9437778 . - . gene_id "ENSG00000252956"; gene_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA";
1 ensembl transcript 9437669 9437778 . - . gene_id "ENSG00000252956"; gene_version "1"; transcript_id "ENST00000517147"; transcript_version "1"; gene_name "RNA5SP40"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP40-201"; transcript_source "ensembl"; transcript_biotype "rRNA"; tag "basic"; transcript_support_level "NA";
.fa
head rna-seq/hg38_nochr.fa
>1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
I have checked my text file format as well:
Sample ID Bam File Notes
1_Sample_S1 /home/shintzen/rna-seq/P001/1_Sample.HISAT2-2.1.0.aligned.sorted.bam 1
2_Sample2_S2 /home/shintzen/rna-seq/P001/2_Sample2.HISAT2-2.1.0.aligned.sorted.bam 2
Am I missing something?
Hello, Did you solve the error given above? I am also facing the same error with RNA-SeQC.
Any help is appreciated.
Though I haven't used RNA-SeQC ever, by looking at the reported error
The required transcript_id attribute was not found on line 1 ....
it seems the tool is looking for the transcript_id attribute and your input.gtf
file is having gene records at first line (probably that could be the master gene record).I think excluding such gene records may help you in running this tool or I am sure there must be
transcripts.gtf
file available you can use that instead ofgene.gtf
.The ENSEMBL gtf files provide a "gene" entry without a transcript_id. This is a violation of the GTF standards. You can remove each line having "gene" at position 3.