Hi everyone,
I have WGS data of 5 human samples from Illumina HiSeq 2500 platform with PE sequencing with read length 101bp. The average size of one sample .sam file is 260GB and I am ran following command to change .sam to .bam format:
/usr/local/samtools-1.2/samtools view -b ETH002102.bwa.sam > ETH002102.bwa.sam.bam
Is there any way to fasten this conversion as it normally taking approx. 18-20 hours?
My 2nd question is related with 'SORTING' of .bam file, for that purpose I ran the following command:
/usr/local/samtools-1.2/samtools sort -@ 14 -m 5000000000 -T /tmp/ETH002102.bwa.sam.sort -o ETH002102.bwa.sam.sort.bam ETH002102.bwa.sam.bam
I also tried -m 5G
but I am not able to get the output. The command run for few minutes and then automatically killed by the clusters. I am not sure where is the error so it would be great if somebody will help me in giving correct command to run on clusters in multiples cores of a node (e.g. here I gave 14 as I had 16) i.e. in parallel mode. Since I am new in this area and doing such work for the first time so any help will be highly appreciated.
Thanks and Regards,
Ravi
Hi, the
m
indicates the maximum required memory per thread. By running the job on 14 threads probably you are running out of memory and this causes the crash. Try to rerun with less threads and default/less memory.