I am doing a genome assembly for three samples with Megahit. I want Megahit to save the output files in the respective sample folders, which are stored in the variable PART.
R1_FILES=($(find ${PART} -name "*_R1_merged.fq.gz"))
R2_FILES=($(find ${PART} -name "*_R2_merged.fq.gz"))
megahit -1 "${R1_FILES[0]}" -2 "${R2_FILES[0]}" -o $PART --out-prefix $PART -m 60e9 -t 8
This does not work, because Megahit wants to overwrite the input folder deleting the files, while I just want to save the output in the same directory. According to the documentation Megahit should just create a new folder named "megahit_out" in each of the three sample folders. If I reference a new output folder it works, but then I can not tell which contig files belong to which sample and it gets very complicated in the following coding steps, because I want to keep working with the PART variable if possible.
hum... are you sure
find
will output the R1 and the R2 in the same order ? https://www.baeldung.com/linux/find-default-sorting-orderIf I reference a new output folder it works, but then I can not tell which contig files belong to which sample and it gets very complicated in the following coding steps, because I want to keep working with the PART variable if possible.
how about just using
and then something like
?
Thanks for your help first of all! I am a bit frustrated right now. Somehow the program is not grabbing the correct directories when I am using my variable (PART).
The file "metagenomics.run" just references my 3 samples like this
When I run the script now, I get the following error message:
FileNotFoundError: [Errno 2] No such file or directory: '/working/directory/sample1/\n/working/directory/sample2/\n/working/directory/sample3/'
At the top of the script add :
( https://gist.github.com/mohanpedala/1e2ff5661761d3abd0385e8223e16425 ) and re-run.
Furthermore, this should be a workflow like Snakemake or nextflow.