Hello,
I am a bioinformatics novice and am trying to learn how to analyze my RNA-seq reads. I came across this program called SPARTA (Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis) which is designed for novices like myself. The program is Python based and functions via command prompt in Windows.
The first process in SPARTA's workflow appears to be quality control using Trimmomatic - and this is where my issues arise. When I run SPARTA, my sequencing read files are processed appropriately and it looks like Trimmomatic runs successfully. Then I am left with the following error message:
C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz ILLUMINACLIP:\Users\Ryan\Desktop\SPARTA_Windows-master\QC_analysis\Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Reads: 2574357 Surviving: 2553622 (99.19%) Dropped: 20735 (0.81%)
TrimmomaticSE: Completed successfully
Traceback (most recent call last):
File "SPARTA.py", line 155, in <module>
qc.trimmomatic(rawdatapath, subfolderpath, options)
File "C:\Users\Ryan\Desktop\SPARTA_Windows-master\qc_analysis.py", line 103, in trimmomatic
extension = file.split(".")[1]
IndexError: list index out of range
I'm currently very confused what the 'IndexError' is referencing and I'm unsure how to resolve this issue.
Any feedback would be greatly appreciated. Thank you!
-Ryan
I never used it, and also I am on linux environment. But still from erroneous line I suppose this program takes input only fastq or fq files, where it splits the filename with
.
character and try to identify the extension. In your case if it will splittrimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz
using.
character. It will get[trimmedGSF1906-Delta-hns-3_S66_R1_001] [fastq] [gz]
. Therefore in last index it will getgz
format which is neitherfastq
norfq
. So I will suggest you that first you extract your fastq files using gunzip or any other extractor available in windows. And then run this pipeline on those fastq files. Good luck.It's
file.split(".")[1]
, notfile.split(").[-1]
, so it's looking at the second part, not the last part. Splitting by.
is not a great way though, so your point might still be applicable.Yes you are absolutly right, I didn't see it carefully. But still I am wondering why this is giving list index out of range error. Because if I run this line in normal console it is running normally.
Probably because the
file
parameter is different in their script.I'm looking through their scripts for the
file
parameter but have not found it yetWhat is the exact command line you're using for
sparta.py
? I'd also recommend using the--verbose
option if you're not using it already.To execute sparta.py? Following the tutorial, I execute the program using
python SPARTA.py
And no options? What OS are you on?
Can you please give us all the details so your question is reproducible?
Also, I added
verbose
but it did not add any additional text to the error messagesCan you show the screenshots from starting to this error?
That fastq name is bog standard Illumina naming. Gzipping fastq files is bog standard, it would have to be extremely stupid software to be designed to handle fastqs and not be able to handle gzipped ones.
Thank you for the feedback! I have tried both ways, zipped and unzipped, with no avail :(