weird error while running rmats on fastq files using hg38 genome
1
0
Entering edit mode
2.9 years ago
Sara ▴ 260

I am trying to run rmats (alternative splicing tool) using fastq files as input files using the following command:

rmats.py --s1 /files/s1.txt --s2 /files/s2.txt --gtf /files/rmats_analysis/gencode.v39.annotation.gtf --bi /files/STAR/hg38/ -t paired --readLength 50 --nthread 4 --od /files --tmp /files/

the gtf file I am using for the analysis is:

gencode.v39.annotation.gtf  

and the genome (fasta file) I used is:

 gencode.v39.transcripts.fa

so I used exactly the same version of genome and gtf file for this analysis. but I am getting this error:

Jan 06 00:40:19 ..... started STAR run
Jan 06 00:40:19 ..... loading genome
Jan 06 00:42:02 ..... processing annotations GTF

Fatal INPUT FILE error, no valid exon lines in the GTF file: /files/gencode.v39.annotation.gtf
Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

Jan 06 00:42:18 ...... FATAL ERROR, exiting
Traceback (most recent call last):
  File "/usr/local/bin/rmats.py", line 595, in <module>
    main()
  File "/usr/local/bin/rmats.py", line 558, in main
    args = get_args()
  File "/usr/local/bin/rmats.py", line 203, in get_args
    args.b1, args.b2 = doSTARMapping(args)
  File "/usr/local/bin/rmats.py", line 81, in doSTARMapping
    raise Exception()
Exception

since I used the same version of genome and gtf file and both from GENCODE, would you please let me know how to fix this issue? I checked both of them and in both chromosomes name start with chr.

rmats • 977 views
ADD COMMENT
1
Entering edit mode
2.9 years ago
jv ★ 1.8k

it looks like you downloaded the transcript sequence fasta file instead of the genome sequence fasta file. The transcript fasta file will not have the same sequence name identifiers as those in the gene annotation GFF3 or GTF files which use chromosome sequence names.

Instead use either the Genome sequence primary assembly or Genome sequence (GRCh38.p13) fasta files for your reference sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 2767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6