Entering edit mode
3.3 years ago
Simone
▴
10
Is there a way to map an entire directory of reads at once? Would I just have to write a script for this specific to my directory structure and data? I'm using BWA MEM to map 49 paired-end reads and have been going 1-by-1, using this line to pipe to samtools:
srun -c 8 --mem 15000 -t 2:00:00 bwa mem -t 8 /projects/tollis_lab/crocs/data/ref_genome/AllMis/GCF_000281125.3_ASM28112v4_genomic.fna /projects/tollis_lab/crocs/analysis/reads_to_map/HC02-USNM42150_1P.fastq.gz /projects/tollis_lab/crocs/analysis/reads_to_map/HC02-USNM42150_2P.fastq.gz | samtools view -hb | samtools sort -l 5 -o /projects/tollis_lab/crocs/analysis/bams_P_AllMis/HC02-USNM42150_C.sp_AllMis_sp.bam
I'm still a newbie so not sure if mapping individually is necessary or there's a much faster way to do it, but I'm mapping to 3 different reference genomes so if there is a faster way I would love to know.
Plenty of threads on how to process a set of files on biostars. One example: bash for loop with two type of files and How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
Looks like you are using a cluster so you can submit multiple parallel jobs for each iteration of the loop.
As GenoMax pointed out, there are many posts dealing on how to loop over files to perform repeated tasks. In addition, you may consider a SnakeMake or NextFlow pipeline, specially if you will be repeating this kind of analysis on a regular basis.
You will need to elaborate on what you are doing and what are your goals. Why are you mapping to 3 different genomes? How is this related to genome assembly (as you tagged your post assembly)?