Running STAR on fastq file generated from a RNA-seq experiment
1
0
Entering edit mode
13 months ago
achanda • 0

Hi, I am new to bioinformatics, especially on the command line. I am trying to run STAR alignment on pairs of fastq.gz files from several samples generated as part of an RNAseq experiment. My goal is to perform splice variant analysis on the output. I am submitting the following slurm job:

#!/bin/bash
#SBATCH --job-name=STAR_alignment
#SBATCH --output=star_alignment_%j.out
#SBATCH --error=star_alignment_%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=125G
#SBATCH --time=7:59:59

GENOME_DIR=~/indices_directory
DATA_DIR=~/Arthur_RNAseq

cd $DATA_DIR

for folder in $(ls -d */); do
    cd $folder
    SAMPLE=$(basename $folder)

    # Create a new output directory for STAR results
    OUTPUT_DIR=$DATA_DIR/$SAMPLE/STAR_output
    mkdir -p $OUTPUT_DIR

    STAR --genomeDir $GENOME_DIR \
         --readFilesIn ${SAMPLE}_merged_R1.fastq.gz ${SAMPLE}_merged_R2.fastq.gz \
         --runThreadN 32 \
         --outFileNamePrefix $OUTPUT_DIR/${SAMPLE}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard

    cd $DATA_DIR
done

However, after running for a few minutes, the process is killed and in the .err file I am getting the following error:

ReadAlignChunk_processChunks.cpp:202:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

I have double checked the files and all start with @ and follow correct format. Only concern is that the read length is 50 instead of the recommended 100 for splicing analysis.

Does anybody have any clues or ideas on how to get around this problem?

STAR RNAseq fastq files • 791 views
ADD COMMENT
0
Entering edit mode

To debug what is being feed to STAR simply:

echo ${SAMPLE}_merged_R1.fastq.gz

Also you may be missing :

--readFilesCommand pigz -dc \

If you do not have pigz installed use gzip instead.

ADD REPLY
3
Entering edit mode
13 months ago
Trivas ★ 1.8k

If you are passing fq.gz files, you need to specify that STAR needs to decompress them using the --readFilesCommand zcat flag. A couple options for decompression (check the manual)

ADD COMMENT

Login before adding your answer.

Traffic: 2082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6