Entering edit mode
7.2 years ago
ta_awwad
▴
350
Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop:
for i in $(ls raw_data); do STAR --genomeDir index.150 \
--readFilesIn raw_data/$i\_1.fq.gz,raw_data/$i\_2.fq.gz \
--runThreadN 20 --outFileNamePrefix aligned/$i. \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--sjdbGTFfile GRCm38.90.gtf \
--readFilesCommand zcat ; done
but it seems that something wrong as the alignment took overnight and it was not done yet.
any recommendation
thanks much
For 36 samples, you could speed up by loading the index into memory, and unloading when finished mapping:
Thank you all for these price less info..
Hi h.mon,
Could you tell me what is the purpose of
index.150
here? Can we just type the location of the genome after--genomeDir
?Yes. In the example given
index.150
is the name of the index that was in the original question. Replace that with yours.If you load the genome before the for loop using: STAR --genomeLoad LoadAndExit --genomeDir genomeDIR Do you still need to specify the --genomeDir parameter in the loop? I tried leaving that out, and STAR failed to run. Then I tried specifying the genome directory in the loop (even though the genome is loaded before the FOR loop), and it looks like each iteration of the loop is still loading the genome.
Can someone explain how to properly load the genome for multiple samples so that the loop is not iteratively loading it, please?
First you load the genome using
--genomeDir $GENOMEDIR --genomeLoad LoadAndExit
. For your alignment(s) you need--genomeDir $GENOMEDIR --genomeLoad LoadAndKeep
To summarize:
Is this correct?
When looping, test if your code is valid by adding an echo statement to see what the command is going to be:
My guess is that the files raw_data/$i_1.fq.gz don't exist because you create $i simply based on the content of raw_data
thanks much WouterDeCoster for your reply. I run your code and got this:
you are right. the file name became different.
any suggestion to correct this??
thanks much
Can you show a few examples of filenames of the fq.gz files?
You could try something like:
I modified the $i to be shorter, and only keep unique hits since all samples will be in there twice.
Thanks much ... it is running now .. but I am not sure how much time it will take .. I will inform you if everything run fine
it looks like it is stuck .. no progress since 30 minutes .. is it normal???
You can have a look with (h)top to see if it's still working. Also, check if it's producing output files.
Hi! I know it's an old post, but I'm hoping for some techical help.
I used your code above, with minor changes. But in the the --readFilesIn part no matter what, the files which are used are renamed. My file names are:
etc. Only the S[number] changes and the R1 and R2 in the names.
So I changed the code like:
My problem, is what mentioned above, that during the readIn part the files what STAR load in named:
This is for the first cycle. What could be the problem? Also I used this code for different files. In that case the file names were 6754726_1.fastq.gz, 6878764_2.fastq.gz etc. (for that I used your original code!)
I think the problem was that STAR doesn't accept compressed files.
it accepts but you need to specify : --readFilesCommand zcat
I did .. and it did not work
Works just fine for me, use it all the time.
"it did not work" doesn't help us know what went wrong, what is the error message? STAR does accept gz compressed files.
just stuck no error message no progress
Try gunzip instead. It works with that.