Help with writing a loop for metaspades assembler (beginner)
2
0
Entering edit mode
2.9 years ago
116439372 • 0

I have a large number of samples that I need to assemble using metaspades. They are named as such;

CUH001T1_unclassified_unpaired_R1 and CUH001T1_unclassified_unpaired_R2
CUH002T2_unclassified_unpaired_R1 and CUH002T2_unclassified_unpaired_R2
..
..

The script I'm using for this is;

spades.py --meta --pe1-1 CUH002T2_unclassified_paired_R1.fastq --pe1-2 CUH002T2_unclassified_paired_R2.fastq -t 20 -m 400 -o ../metaspades/CUH002T2

Does anyone know how to write a loop for this? I'm sure its relatively easy but I'm very new to bioinformatics and can't figure it out.

metaspades • 2.1k views
ADD COMMENT
1
Entering edit mode
$ find . -type f -name "*_unclassified_paired_R1*" -exec basename {} \; | while read line; do echo spades.py --meta --pe1-1  /home/user/$line --pe1-2 /home/user/${line/_R1/_R2} -t 20 -m 400 -o ../metaspades/${line%%_*} ;done

Replace /home/user with appropriate directory path. Remove echo once you are okay with dummy run

ADD REPLY
0
Entering edit mode

This worked for me, thank you!

ADD REPLY
0
Entering edit mode
MYPATH="/path/to/fastq/data"; for FLE in $(ls ${MYPATH}/*.fq | sed 's/_R[12].*$//' | sort | uniq); do echo spades.py --meta --pe1-1 ${FLE}_R1.fastq --pe1-2 ${FLE}_R2.fastq -t 20 -m 400 -o ../metaspades/$(basename ${FLE}; done

Try this. You'll need to point MYPATH= to wherever it is your data is stored (e.g., /home/myname/metagenomics). Then try running the entire command from the terminal. It should print spades.py <blah> to the console, one line each for as many samples as you have. If everything looks fine, remove the echo in there, and run it.

I'm not sure for looping through your datasets is the most efficient way though. Are you working on a local workstation of some sort or a cluster?

ADD REPLY
0
Entering edit mode
2.9 years ago
Shred ★ 1.6k

Use basename to get filename without path, then cut by underscore.

for sample in $dir/*_unclassified_unpaired_R1.fastq ; do
sample_name = $(basename $sample | cut -d'_' -f1)
spades.py --meta --pe1-1 ${sample_name}_unclassified_paired_R1.fastq \
--pe1-2 ${sample_name}_unclassified_paired_R2.fastq -t 20 -m 400 -o ../metaspades/${sample_name} ;
done
ADD COMMENT
0
Entering edit mode
2.9 years ago

I use something like this for a spades loop. Start with: bash yourScript.sh

Note I begin with trimmed FASTQs, as you should too.

#!/bin/bash

# Start spades assemblies

for i in `ls *R1.trm.fastq`

        do
        echo $i
        echo "Input file 1: " $1
        fastq=$1
        # derive R2 from R1
        fastq2="${fastq/R1/R2}"

        # Run script

        spades.py -o $fastq.spades -t 18 -m 250 --meta -1 $fastq -2 $fastq2

done
ADD COMMENT

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6