Dear all would you please help me to modify my loop in bash to align my samples of FASTQ files? I have paired end RNA-seq files as:
EGG12-Clean_ACTGAT_S33_L004_R1_001.fastq
EGG12-Clean_ACTGAT_S33_L004_R2_001.fastq
EGG14-Clean_GAGTGG_S34_L004_R1_001.fastq
EGG14-Clean_GAGTGG_S34_L004_R2_001.fastq
... I have tried:
#!/bin/bash
export RNA_HOME=~/workspace/rnaseq
cd $RNA_HOME
export RNA_DATA_DIR=$RNA_HOME/data
cd $RNA_DATA_DIR
export RNA_REF_INDEX=$RNA_REFS_DIR/amel_OGSv3.2
for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)
do
hisat2 -p 8 \
--rg-id=${i} \
--rg SM:${i}\
--rg PL:ILLUMINA \
-x $RNA_REF_INDEX \
--dta --rna-strandness RF \
-1 $RNA_DATA_DIR/${i}_*_R1_001.fastq \
-2 $RNA_DATA_DIR/${i}_*_R2_001.fastq \
-S ./${i}.sam
done;
but unfortunately it can not find the paired end files and therefore will not be executed. Any help would be appreciated
Looks like you are entering twice in
$RNA_DATA_DIR
. You do:In addition
$RNA_REFS_DIR
is not defined.$RNA_DATA_DIR/${i}_*_R1_001.fastq
is also weird. Why do you use a wildcard here ?I think your code is a bit messed up. You should probably test your loop first with something like:
If this doesn't work, then you know you have issues with your loop.
Thanks but it works fine and it is exactly what I need
No it was only to let you know about the direction.
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
One issue you have is that your files are .fastq files but your code is trying to search for files that end in .fa. Change your for loop to this:
I don't think this is the only issue but try running this and seeing what happens.
Thanks, I did run it but still it is not working
can you update your original post with the modified code you are using, and any error message you are getting? Also can you give the
ls
ortree
output of the directory you are running the script in?ls: cannot access '*.fastq': No such file or directory
this is what I get.
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.DO YOU UNDERSTAND?
Do not use quotes with shell globs (aka
*
); the command needs to be:ls *.fastq
@OP: Your
$i
isEGG12-Clean_ACTGAT_
and when you reconstruct with$RNA_DATA_DIR/${i}_*_R1_001.fastq
, your sample isEGG12-Clean_ACTGAT__*_R1_001.fastq
(with an extra_
) instead ofEGG12-Clean_ACTGAT_*_R1_001.fastq
. Remove_
after}
(some thing like:${i}*_R1
). Try to print$i
(echo $i
for both reads) after renaming. Changefa
tofastq
as mentioned above.Also try to use bash string manipulation:
thanks @ram