Question

SRA to paired fastq per read group

0

Entering edit mode

4.4 years ago

MAPK ★ 2.1k

I am trying to download SRA data and create paired end fastq files per read groups. Can someone please share how I can get this done? I would really appreciate if you could share a shell script to do this.

I tried this, which only splits fastq per RGs, but I also need to split them into FQ1 and FQ2 per RGs.

SRR="SRR1350739"
IFS=$'\n'
RGLINES=($(sam-dump --ngc XXXX.ngc ./${SRR} | sed -n '/^[^@]/!p;//q' | grep ^@RG))
args=(tee)
for RGLINE in ${RGLINES[@]}; do
  unset IFS
  RG=(${RGLINE})
args+=(\>\(grep -A3 --no-group-separator \"\\.${RG[1]#ID:}/[12]$\" \| gzip \> "./${SRR}.${RG[1]#ID:}.fastq-dump.split.defline.z.tee.fq.gz"\))

done

echo "Splitting ${SRR} into ${#RGLINES[@]} ReadGroups"
fastq-dump --ngc XXXX.ngc --split-e --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z "${SRR}" | eval ${args[@]}

sra NGS • 1.2k views

ADD COMMENT • link updated 4.4 years ago by GenoMax 149k • written 4.4 years ago by MAPK ★ 2.1k

score 0 · Answer 1 · 2020-09-15

0

Entering edit mode

4.4 years ago

GenoMax 149k

Use bamtofastq from biobambam2. It can separate data into RG specific files.

ADD COMMENT • link 4.4 years ago by GenoMax 149k