Question

STAR Mapping muliple files

0

Entering edit mode

4.5 years ago

ipalmisa ▴ 10

Hi I am new in bioinformatic analysis and I would like to double check if I am doing things right. Thanks in advance for the help.

First question: I am using STAR to map my fastq files. It is a double ended RNAseq and I have multiple runs for each sample I am using the following command:

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

But I get this error message;

gzip: S*_R1_002.fastq.gz: No such file or directory
gzip: S*_R1_003.fastq.gz: No such file or directory
gzip: S*_R1_004.fastq.gz: No such file or directory
gzip: S*_R1_005.fastq.gz: No such file or directory
gzip: S*_R1_006.fastq.gz: No such file or directory
gzip: S*_R1_007.fastq.gz: No such file or directory
gzip: S*_R1_008.fastq.gz: No such file or directory
gzip: S*_R2_002.fastq.gz: No such file or directory
gzip: S*_R2_003.fastq.gz: No such file or directory
gzip: S*_R2_004.fastq.gz: No such file or directory
gzip: S*_R2_005.fastq.gz: No such file or directory
gzip: S*_R2_006.fastq.gz: No such file or directory
gzip: S*_R2_007.fastq.gz: No such file or directory
gzip: S*_R2_008.fastq.gz: No such file or directory

The files are in the specified path. What am i doing wrong?

Second question: if I have only one R1 and one R2 file in the directory, can I use the * to avoid writing the file name? Is that correct? STAR would be able to match them, as they are the only R1 and R2 files in the folder, am I correct?

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/*_R1_*.fastq.gz $RDS/projects/*_R2_*.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

Thanks Ilaria

alignment • 1.6k views

ADD COMMENT • link updated 4.5 years ago by swbarnes2 14k • written 4.5 years ago by ipalmisa ▴ 10

0

Entering edit mode

A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 4.5 years ago by lieven.sterck 15k

0

Entering edit mode

oh Thank you! I didn't know that!

ADD REPLY • link 4.5 years ago by ipalmisa ▴ 10

0

Entering edit mode

it cannot work with '*' expension if the tool expect a list of comma separated files.

try

echo $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz

to see what is happening....

ADD REPLY • link 4.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you I have tried echo...you are right. Does this mean that I need to write the file names each time? I have around 200 files, with different names...isn't there any short cut, please?

ADD REPLY • link 4.5 years ago by ipalmisa ▴ 10

score 0 · Answer 1 · 2020-05-15

0

Entering edit mode

4.5 years ago

Pierre Lindenbaum 164k

I have around 200 files, with different names...isn't there any short cut, please?

   --readFilesIn `ls $RDS/projects/*_R1_*.fastq.gz | sort | tr "\n" ","` `ls $RDS/projects/*_R2_*.fastq.gz | sort | tr "\n" ","`

ADD COMMENT • link 4.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks a lot! I will do this way!

ADD REPLY • link 4.5 years ago by ipalmisa ▴ 10

score 0 · Answer 2 · 2020-05-15

0

Entering edit mode

4.5 years ago

swbarnes2 14k

You might also make your life easier by catting all the relevant files together first, so you can give STAR just one R1 and one R2 file.

Also, are you completely sure that you want to combine your fastqs the way you have? Usually, different numbers after the S mean totally different samples.

ADD COMMENT • link 4.5 years ago by swbarnes2 14k

0

Entering edit mode

Hi thanks. Yes, I knew about the cat option, but I thought this way would be easier It's the same sample (SDG10) run more than once, so I have SDG10_GAGTGG_L006_R1_001.fastq SDG10_GAGTGG_L006_R2_001.fastq SDG10_GAGTGG_L006_R3_001.fastq and so on..... Thanks again

ADD REPLY • link 4.5 years ago by ipalmisa ▴ 10