Hello,
I need to run the below commands in my university cluster to be able to do SortMeRNA for my fastq.gz files from RNA-seq experiment. I have 18 paired-end fastq.gz files naming like 30786524-PBG_NAM_18_R1.fastq.gz
& 30786524-PBG_NAM_18_R2.fastq.gz
R1 shows forward read and R2 is reverse read.
All of my fastq.gz files have different names but they are common in R1.fastq.gz and R2.fastq.gz
I should save the below chunk of code in nano:
#!/bin/bash
READ_FW="$1"
READ_RV="$2"
FILEBASE=$(basename "${READ_FW/_1.fq.gz/}")
echo "Uncompressing FASTQ data of $FILEBASE"
gunzip "$READ_FW" "$READ_RV"
READ_FW="${READ_FW%.gz}"
READ_RV="${READ_RV%.gz}"
echo "Merging pairs of $FILEBASE"
merge-paired-reads.sh "$READ_FW" "$READ_RV" "${FILEBASE}_interleaved.fq"
echo "Running SortMeRNA for $FILEBASE"
sortmerna --ref $SORTMERNA_DB --reads "${FILEBASE}_interleaved.fq" --aligned \
"${FILEBASE}-rRNA-hits" --other "${FILEBASE}-sortmerna" --log -a 16 \
-v --paired_in --fastx
echo "Unmerging SortMeRNA filtered pairs for $FILEBASE"
unmerge-paired-reads.sh "${FILEBASE}-sortmerna.fq" \
"${FILEBASE}-sortmerna_1.fq" "${FILEBASE}-sortmerna_2.fq"
echo "Doing cleanup for $FILEBASE"
gzip "$READ_FW" "$READ_RV" "${FILEBASE}-sortmerna_1.fq" \
"${FILEBASE}-sortmerna_2.fq" "${FILEBASE}-rRNA-hits.fq"
rm "${FILEBASE}_interleaved.fq" "${FILEBASE}-sortmerna.fq"
Then, I should use a loop to execute this script:
mkdir ~/sortmerna
cd ~/sortmerna
find ../raw -name "*.fq.gz" | sort | head -n 32 | while read READ_FW
do
read READ_RV
bash ../runSortMeRNA.sh $READ_FW $READ_RV
done
As I am new in programming, may I know what is FILEBASE
? Should I specify a name or a path for it?
Also, should I keep ${READ_FW/_1.fq.gz/}
, READ_FW="$1"
and READ_RV="$2"
exactly like this or adapt them based on my data?
What about find ../raw -name "*.fq.gz"
and bash ../runSortMeRNA.sh
? Which parts of the codes I should adapt?
I would highly appreciate your help.
Best wishes,
Farah
Can you explain the role of
--paired_in
and--paired_out
? I am still confused about these two options after reading the usage. Thanks.