RNASeqc doesn't like any of my GTF files. I have no problem running the example dataset. Any idea why? When I run this:
rnaseqc.v2.3.4.linux gencode.v29.compressed-noGene.gtf NuGen_mRNA_10_ERCC2_2_md.bam OUTPUT
I get this result:
There were either no genes or no exons in the GTF
0 genes parsed
334017 exons parsed
I have a freshly downloaded gencode gtf file which was treated with gtx-pipeline
python3 gtex-pipeline/gene_model/collapse_annotation.py Homo_sapiens.GRCh38.97.gtf Homo_sapiens.GRCh38.97_compressed.gtf
and to remove genes and just keep exons:
awk -F'\t' '($3 != "gene")' gencode.v29.compressed.gtf > gencode.v29.compressed-noGene.gtf
And 334017 exons were confirmed:
cat gencode.v29.compressed-noGene.gtf | awk -F'\t' '{print $3}' | grep -c exon
334017
So why can't it run?
The bam file processes fine with the older version of RNA seq:
java -jar RNA-SeQC_v1.1.8.jar -n 10000 -s "NuGen_mRNA_10_ERCC1_2|R3e_MAP_READS/NuGen_mRNA_10_ERCC1_2_md.bam|Desc" -t Homo_sapiens.GRCh38.83_filtered_transcript_id.gtf -r Homo_sapiens.GRCh38.dna.primary_assembly.fa -o OUT
But I can't get either the Ensemble GTF or gencode GTF to work with the newer version of RNASeqc. So I'm using gencode GTF and now running into this issue. At least the software doesn't crash. Anyone with RNASeqc experience have advice on how I can get this to run?
When I try to compress an Ensemble format:
python3 gtex-pipeline/gene_model/collapse_annotation.py Homo_sapiens.GRCh38.83_filtered_transcript_id.gtf Homo_sapiens.GRCh38.83_collapsed.gtf
I get this error:
Traceback (most recent call last): File "SOFTWARE/gtex-pipeline/gene_model/collapse_annotation.py", line 249, in <module>
annotation = Annotation(args.transcript_gtf) File "SOFTWARE/gtex-pipeline/gene_model/collapse_annotation.py", line 76, in __init__
t = Transcript(attributes.pop('transcript_id'), attributes.pop('transcript_name'), attributes.pop('transcript_type'), g, start_pos, end_pos) KeyError: 'transcript_type'
Are the chromosomal identifiers matching in all files that you are using?
I think so. I've ran the bam file against both the original gencode gtf file and a modified version where I converted the "chr2" to just "2". Basically just removing all of the "chr" from the 1st column. I still have issues.
Legend for the text below:
Column 1 = Bam file (samtools view | awk -F'\t' '{print $3}')
Column 2 = modified GTF file ( awk -F'\t' '{print $1}')
Column 3 = original GTF file ( awk -F'\t' '{print $1}')
=====================================================
Easy to read screenshot
Text version below
I think I solved the situation.
Still investigating but at least I'm getting output now. Will post an update later.
Hi, can you please explain how you solved the situation?