Question

Running STAR on fastq file generated from a RNA-seq experiment

0

Entering edit mode

18 months ago

achanda • 0

Hi, I am new to bioinformatics, especially on the command line. I am trying to run STAR alignment on pairs of fastq.gz files from several samples generated as part of an RNAseq experiment. My goal is to perform splice variant analysis on the output. I am submitting the following slurm job:

#!/bin/bash
#SBATCH --job-name=STAR_alignment
#SBATCH --output=star_alignment_%j.out
#SBATCH --error=star_alignment_%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=125G
#SBATCH --time=7:59:59

GENOME_DIR=~/indices_directory
DATA_DIR=~/Arthur_RNAseq

cd $DATA_DIR

for folder in $(ls -d */); do
    cd $folder
    SAMPLE=$(basename $folder)

    # Create a new output directory for STAR results
    OUTPUT_DIR=$DATA_DIR/$SAMPLE/STAR_output
    mkdir -p $OUTPUT_DIR

    STAR --genomeDir $GENOME_DIR \
         --readFilesIn ${SAMPLE}_merged_R1.fastq.gz ${SAMPLE}_merged_R2.fastq.gz \
         --runThreadN 32 \
         --outFileNamePrefix $OUTPUT_DIR/${SAMPLE}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard

    cd $DATA_DIR
done

However, after running for a few minutes, the process is killed and in the .err file I am getting the following error:

ReadAlignChunk_processChunks.cpp:202:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

I have double checked the files and all start with @ and follow correct format. Only concern is that the read length is 50 instead of the recommended 100 for splicing analysis.

Does anybody have any clues or ideas on how to get around this problem?

STAR RNAseq fastq files • 1.3k views

ADD COMMENT • link updated 18 months ago by Darked89 4.7k • written 18 months ago by achanda • 0

0

Entering edit mode

To debug what is being feed to STAR simply:

echo ${SAMPLE}_merged_R1.fastq.gz

Also you may be missing :

--readFilesCommand pigz -dc \

If you do not have pigz installed use gzip instead.

ADD REPLY • link 18 months ago by Darked89 4.7k

score 3 · Answer 1 · 2023-11-27

3

Entering edit mode

18 months ago

Trivas ★ 1.9k

If you are passing fq.gz files, you need to specify that STAR needs to decompress them using the --readFilesCommand zcat flag. A couple options for decompression (check the manual)

ADD COMMENT • link 18 months ago by Trivas ★ 1.9k