Question

Map Entire Directory of Paired-End Reads at Once

0

Entering edit mode

3.4 years ago

Simone ▴ 10

Is there a way to map an entire directory of reads at once? Would I just have to write a script for this specific to my directory structure and data? I'm using BWA MEM to map 49 paired-end reads and have been going 1-by-1, using this line to pipe to samtools:

srun -c 8 --mem 15000 -t 2:00:00 bwa mem -t 8 /projects/tollis_lab/crocs/data/ref_genome/AllMis/GCF_000281125.3_ASM28112v4_genomic.fna /projects/tollis_lab/crocs/analysis/reads_to_map/HC02-USNM42150_1P.fastq.gz /projects/tollis_lab/crocs/analysis/reads_to_map/HC02-USNM42150_2P.fastq.gz | samtools view -hb | samtools sort -l 5 -o /projects/tollis_lab/crocs/analysis/bams_P_AllMis/HC02-USNM42150_C.sp_AllMis_sp.bam

I'm still a newbie so not sure if mapping individually is necessary or there's a much faster way to do it, but I'm mapping to 3 different reference genomes so if there is a faster way I would love to know.

bwa bowtie genomics assembly reads • 1.1k views

ADD COMMENT • link updated 3.4 years ago by h.mon 35k • written 3.4 years ago by Simone ▴ 10

0

Entering edit mode

Plenty of threads on how to process a set of files on biostars. One example: bash for loop with two type of files and How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?

Looks like you are using a cluster so you can submit multiple parallel jobs for each iteration of the loop.

ADD REPLY • link 3.4 years ago by GenoMax 148k

0

Entering edit mode

As GenoMax pointed out, there are many posts dealing on how to loop over files to perform repeated tasks. In addition, you may consider a SnakeMake or NextFlow pipeline, specially if you will be repeating this kind of analysis on a regular basis.

I'm still a newbie so not sure if mapping individually is necessary or there's a much faster way to do it, but I'm mapping to 3 different reference genomes so if there is a faster way I would love to know.

You will need to elaborate on what you are doing and what are your goals. Why are you mapping to 3 different genomes? How is this related to genome assembly (as you tagged your post assembly)?

ADD REPLY • link 3.4 years ago by h.mon 35k