Entering edit mode
4.3 years ago
MAPK
★
2.1k
I am trying to download SRA data and create paired end fastq files per read groups. Can someone please share how I can get this done? I would really appreciate if you could share a shell script to do this.
I tried this, which only splits fastq per RGs, but I also need to split them into FQ1 and FQ2 per RGs.
SRR="SRR1350739"
IFS=$'\n'
RGLINES=($(sam-dump --ngc XXXX.ngc ./${SRR} | sed -n '/^[^@]/!p;//q' | grep ^@RG))
args=(tee)
for RGLINE in ${RGLINES[@]}; do
unset IFS
RG=(${RGLINE})
args+=(\>\(grep -A3 --no-group-separator \"\\.${RG[1]#ID:}/[12]$\" \| gzip \> "./${SRR}.${RG[1]#ID:}.fastq-dump.split.defline.z.tee.fq.gz"\))
done
echo "Splitting ${SRR} into ${#RGLINES[@]} ReadGroups"
fastq-dump --ngc XXXX.ngc --split-e --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z "${SRR}" | eval ${args[@]}