Hello all,
So I have been trying to write up a .sh script that will automate the process of running HISAT2 on a bunch of paired end data. Since these are massive read files and physical memory on my computer is limited I have been writing a bash shell script that pulls read files off two at a time, puts them into a folder on my desktop, runs programs on them sending output back to the external hard drive into a freshly made folder there, then deletes the two read files from my computer before moving on to the next pair.
This script has worked quite nicely in the texting phase, where I ran a simple regular expression perl script on the read files to check for my programs functionality.
However, I now have to write a line into it calling HISAT2 and I find that Im not sure how to do that! Since the names of the files are going to be changing, I was wondering if I could use variables names or wild card characters in the HISAT2 commands. My few faltering attempts to do this have given me error messages from HISAT2, specifically "extra parameters" being detected.
I'd be very grateful for an help you could provide in this matter. I have included my script below, and I am painfully aware of how clunky and inefficient it probably is. However, it has the incredible virtue of being code I actually understand (and as the rankest of amateurs when it comes to bash, that REALLY matters more to me than efficiency), so I am primarily concerned with addressing the HISAT2 issue. That said, I would of course take any advice regarding how to tackle such a problem the next time around to problem to heart!
#!/bin/sh
prev_dir=/Volumes/My\ Passport/systemPipeR_tests_arabidopsis/data/
new_dir=/Users/mylapple/desktop/test_reads_folder_b
count=0 # keeps track of how many .fastq files have been moved
cd "$prev_dir"
for i in `cat targeted_files.txt`
do
sed -i '' 's/\r$//' $i
#Making a folder in the directory on our external HD
DIR="${i%%_*}"
mkdir -p "$prev_dir"/$DIR
#Copying 2 mated .fastq files to desktop
cd "$prev_dir"
cp $i "$new_dir"
(( count++ ))
cd "$new_dir"
#With both mate pair files in desktop folder
if [ $count -eq 2 ]
then
echo "$DIR"" Forward and Reverse reads both in temporary folder. Processing..."
count=0 #re-zero count
sleep 2 # pause to allow user to visually desktop folder contents
#Loop over all the read files in desktop folder
for f in ./*.fastq;
do
# A simple regular expression substitution script, used while writing/testing/troubleshooting script
perl /Users/mylapple/desktop/test_reads_folder_perl/regexSwapFQ.pl "$f" > "$prev_dir"/"$DIR"/"${f%.*}_trimmed.fastq"
done
#What I would LIKE to do, is use the two read files as part of HISAT2 run
#Ideally, output from HISAT2 would be sent to the new folder on ext HD
#Heres the problem....
# Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
# hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"
#Now remove the two fastq files from desktop folder
rm ./*
fi
cd "$prev_dir"
done
Again, my main concern is with this bit right here:
# Can I use wildcard characters/variable names in the following line for the 2 read files? For the output dir?
# hisat2 -x genome_snp_tran -1 seqrunID_1.fastq -2 seqrunID_2.fastq -S "seqrunID"
try: