SPARTA (RNA-seq) workflow issue
1
0
Entering edit mode
5.3 years ago
rchapari • 0

Hello,

I am a bioinformatics novice and am trying to learn how to analyze my RNA-seq reads. I came across this program called SPARTA (Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis) which is designed for novices like myself. The program is Python based and functions via command prompt in Windows.

The first process in SPARTA's workflow appears to be quality control using Trimmomatic - and this is where my issues arise. When I run SPARTA, my sequencing read files are processed appropriately and it looks like Trimmomatic runs successfully. Then I am left with the following error message:

C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz ILLUMINACLIP:\Users\Ryan\Desktop\SPARTA_Windows-master\QC_analysis\Trimmomatic-0.33/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
Input Reads: 2574357 Surviving: 2553622 (99.19%) Dropped: 20735 (0.81%)
TrimmomaticSE: Completed successfully
Traceback (most recent call last):
   File "SPARTA.py", line 155, in <module>
      qc.trimmomatic(rawdatapath, subfolderpath, options)
   File "C:\Users\Ryan\Desktop\SPARTA_Windows-master\qc_analysis.py", line 103, in trimmomatic
      extension = file.split(".")[1]
IndexError: list index out of range

I'm currently very confused what the 'IndexError' is referencing and I'm unsure how to resolve this issue.

Any feedback would be greatly appreciated. Thank you!

-Ryan

RNA-Seq SPARTA Python • 1.7k views
ADD COMMENT
0
Entering edit mode

I never used it, and also I am on linux environment. But still from erroneous line I suppose this program takes input only fastq or fq files, where it splits the filename with . character and try to identify the extension. In your case if it will split trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz using . character. It will get [trimmedGSF1906-Delta-hns-3_S66_R1_001] [fastq] [gz]. Therefore in last index it will get gz format which is neither fastq nor fq. So I will suggest you that first you extract your fastq files using gunzip or any other extractor available in windows. And then run this pipeline on those fastq files. Good luck.

ADD REPLY
1
Entering edit mode

It's file.split(".")[1], not file.split(").[-1], so it's looking at the second part, not the last part. Splitting by . is not a great way though, so your point might still be applicable.

ADD REPLY
0
Entering edit mode

Yes you are absolutly right, I didn't see it carefully. But still I am wondering why this is giving list index out of range error. Because if I run this line in normal console it is running normally.

file="C:\Users\Ryan\Desktop\RNAseq_Data\2019-08-28_21\QC\trimmedGSF1906-Delta-hns-3_S66_R1_001.fastq.gz"
>>> extension = file.split(".")[1]
>>> extension
'fastq'
ADD REPLY
0
Entering edit mode

Probably because the file parameter is different in their script.

ADD REPLY
0
Entering edit mode

I'm looking through their scripts for the file parameter but have not found it yet

ADD REPLY
1
Entering edit mode

What is the exact command line you're using for sparta.py? I'd also recommend using the --verbose option if you're not using it already.

ADD REPLY
0
Entering edit mode

To execute sparta.py? Following the tutorial, I execute the program using python SPARTA.py

ADD REPLY
0
Entering edit mode

And no options? What OS are you on?

Can you please give us all the details so your question is reproducible?

ADD REPLY
0
Entering edit mode

Also, I added verbose but it did not add any additional text to the error messages

ADD REPLY
0
Entering edit mode

Can you show the screenshots from starting to this error?

ADD REPLY
0
Entering edit mode

That fastq name is bog standard Illumina naming. Gzipping fastq files is bog standard, it would have to be extremely stupid software to be designed to handle fastqs and not be able to handle gzipped ones.

ADD REPLY
0
Entering edit mode

Thank you for the feedback! I have tried both ways, zipped and unzipped, with no avail :(

ADD REPLY
0
Entering edit mode
5.3 years ago

It looks like Sparta is trying to be clever and assumes that every file it sees in the target directory is a fastq. Is this true? Because to me it looks like it's successfully trimming the first file it sees, but the second file is not a fastq, and that's halting the program.

it looks like SPARTA is also adding "trimmed" to the names of the trimmomatic output, but you are giving it as input?

ADD COMMENT

Login before adding your answer.

Traffic: 1030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6