I am trying to align a genome to the GRCh38 reference genome found here (no alts, with decoys) using the following command:
bwa mem -t 24 GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna R1.fastq.gz R2.fastq.gz | samtools sort -@24 -o FinalBAM.bam
All the files (including the fasta bwa index files) are in the home directory on a google cloud vm (24 core, 156GB memory, 500GB SSD) running Ubuntu 16.04. This exact same process succeeded in aligning to GRCh37 using the hs37d5.fa file. However, when I run this command as above I get the following back:
Usage: samtools sort [options] <in.bam> <out.prefix>
Options: -n(....other options listed etc.)
[M: :bwa_idx_load_from_disk] read 0 ALT contigs
And then it boots me back to the command line immediately with no other feedback. What am I doing wrong?
This is a problem with the samtools syntax. Your version is quite ancient, see answer of John Marshall below and consider upgrading to the most recent version. If you want to stick with this version you probably need to set a prefix for the output file. Not sure if
-o
does already exist in this version. Check the options when typingsamtools sort
alone.First you need to decompress your .fastq files and index the reference , try the below command :
The '-' in samtools view tells it to read from stdin
i hope it works with you
Best Regards AM
Hey, I doubt this is the reason: If no index was present bwa would throw an error and would not start loading the index indicated by
[M: :bwa_idx_load_from_disk] read 0 ALT contigs
. bwa will also accept compressed files, fyi. Will move this to a comment. It comes down to improper use of samtools sort as the version is quite old and syntax would only work on more recent version.