How to run STAR with multiple files
2
1
Entering edit mode
4.2 years ago
Kumar ▴ 170

Hi, I have a total of 197 PE samples (R1 and R2). I am trying to run STAR aligner with all these files simultaneously. I am trying with the following command. However, it seems something wrong with this script. any recommendation thanks much

for i in $(ls raw_data); do echo /DataAnalysis/STAR-2.7.5a/bin/Linux_x86_64/./STAR --genomeDir 
/DataAnalysis/test-star/SAindex \
--readFilesIn raw_data/${i}_R1.fastq,raw_data/${i}_R2.fastq \
--runThreadN 8 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts; done
alignment rna-seq SRAT-aligner • 5.2k views
ADD COMMENT
1
Entering edit mode

There should be a space between the two file names, not a comma.

ADD REPLY
0
Entering edit mode

I tried but not working.

ADD REPLY
0
Entering edit mode

Don't forget to define your read groups (--outSAMattrRGline) if you're doing multi-sample alignment.

ADD REPLY
6
Entering edit mode
4.2 years ago
h.mon 35k

A couple of remarks:

  1. Using ls to feed a loop or an array is not a good idea, better use globing or find (yes, I know the irony, I have advocated using ls exactly in this manner).
  2. As already noted by rpolicastro , STAR expects the input file names separated by a space.
  3. The output of the ls raw_data will include both R1 and R2 files, so the file names you are using will be wrong. They will be something like

    raw_data/file01_R1.fastq_R1.fastq,raw_data/file01_R1.fastq_R2.fastq raw_data/file01_R2.fastq_R1.fastq,raw_data/file01_R2.fastq_R2.fastq raw_data/file02_R1.fastq_R1.fastq,raw_data/file02_R1.fastq_R2.fastq

    and so on.

  4. you have an echo in front of your STAR command, so nothing will be run, the command will be echoed to the screen. This is used to troubleshoot the command, not to run it.

  5. You are missing the --genomeDir argument preceding the index.

Once you fix these issues, try again, and if something goes wrong, please post the error message as well, because "it seems something wrong with this script" is not informative at all.

ADD COMMENT
0
Entering edit mode

I have improved the script. However, it is still showing the following error.

for i in $(raw_data/270_R1.fastq,raw_data/270_R2.fastq raw_data/272_R1.fastq,raw_data/272_R2.fastq 
raw_data/274_R1.fastq,raw_data/274_R2.fastq raw_data/278C_R1.fastq,raw_data/278C_R2.fastq 
raw_data/284C_R1.fastq,raw_data/284C_R2.fastq); 
do 
/DataAnalysis/STAR-2.7.5a/bin/Linux_x86_64/./STAR --genomeDir /DataAnalysis/Manoj-data/test-star/SAindex \
--readFilesIn raw_data/${i}_R1.fastq,raw_data/${i}_R2.fastq \
--runThreadN 8 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts; done

error:

./star.sh: line 7: raw_data/270_R1.fastq,raw_data/270_R2.fastq: No such file or directory
ADD REPLY
1
Entering edit mode

Two people have already mentioned above that:

STAR expects the input file names separated by a space

yet you are still using a comma in between file names.

--readFilesIn raw_data/${i}_R1.fastq,raw_data/${i}_R2.fastq
ADD REPLY
0
Entering edit mode

I tried with or without a comma. However, it is showing the same error. Also, I tried with ls before raw_data. It is showing the following ERROR.

ls: cannot access raw_data/270_R1.fastq,raw_data/270_R2.fastq: No such file or directory
ls: cannot access raw_data/272_R1.fastq,raw_data/272_R2.fastq: No such file or directory
ls: cannot access raw_data/274_R1.fastq,raw_data/274_R2.fastq: No such file or directory
ls: cannot access raw_data/278C_R1.fastq,raw_data/278C_R2.fastq: No such file or directory
ls: cannot access raw_data/284C_R1.fastq,raw_data/284C_R2.fastq: No such file or directory
ADD REPLY
0
Entering edit mode

I did not get what is file01_R1.fastq_R1.fastq. Could you please clarify that?

ADD REPLY
2
Entering edit mode
2.1 years ago
DareDevil ★ 4.3k

You should remove _R1.fastq from the file name.

for i in $(raw_data/270,raw_data/272, raw_data/274,raw_data/278C,raw_data/284C); do 
/DataAnalysis/STAR-2.7.5a/bin/Linux_x86_64/./STAR --genomeDir /DataAnalysis/Manoj-data/test-star/SAindex \
    --readFilesIn raw_data/${i}_R1.fastq raw_data/${i}_R2.fastq \
    --runThreadN 8 --outFileNamePrefix aligned/$i. \
    --outSAMtype BAM SortedByCoordinate \
    --quantMode GeneCounts; done
ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6