Hi, I am using Arima genomics pipeline for Hi-C data to make scaffolds for the assembled contigs. So I ran this given command on slurm. My input files representing the abc_f1.filter.bam
and abc_f2.filter.bam
are 42GB and 43 GB, respectively. My assembled contigs file (abc_assembly.hic.p_ctg.fasta
) is 1.65 GB. After combining and quality check command, the output file (abc_combine2.bam
) has the size 72MB after 2 hours of running the code. I have also shared the log file tail part and there is no error in this. Actaully, there is no error in the code, can I say my output file is fine for the next step?
#!/bin/bash
#
#SBATCH --job-name=combine
#SBATCH --output=combine.%j.out
#SBATCH --partition=batch
#SBATCH --cpus-per-task=32
#SBATCH --time=25:00:00
#SBATCH --mem=200G
MAPQ_FILTER=10
module load samtools/1.16.1 perl/5.38.0/intel2022.3
export PATH=~path/arima_yahs/arima_pipeline/:$PATH
two_read_bam_combiner.pl abc_f1.filter.bam abc_f2.filter.bam samtools $MAPQ_FILTER | samtools view -bS -t abc_assembly.hic.p_ctg.fasta - | samtools sort -@ 32 -o abc_combine2.bam
here is the tail view of log file
413000000
414000000
415000000
416000000
417000000
418000000
419000000
420000000
421000000
422000000
[bam_sort_core] merging from 0 files and 32 in-memory blocks...
I shall be grateful to you.
Since this is set to 10 one guess is that is some how excluding/filtering a large number of alignments.
What do you suggest here? Should I remove this
MAPQ_FILTER=10
parameter to get all alignments?I don't know what else the perl script is doing but that would be worth a try.
here is the script of two_read_bam_combiner.pl
Did you try the script without the filter? Did that restore the alignments in final file.
I did not try this option. I try this and will share if get some good results.