I recently used samtools in order to downsample some bam files I have from a sequencing run as such:
Samtools view -h -s 0.5 file.bam > ds_file.bam
I was then going to go through a normalization protocol I have to average the reads to reads per million, which first requires me to remove mitochondrial reads, which I normally do with idxstats:
Samtools index ds_file.bam
Samtools idxstats ds_file.bam | cut -f 1 | grep -v MT | xargs samtools view -b ds_file.bam > dsfile_rmMT.bam
However, when running idxstats with these downsampled files, I am getting an error message saying that the index has failed to load. I checked the downsampled bam file and it is definitely still sorted by coordinates. Does anyone know why I am failing to generate a useable index?
so when you run
Samtools index ds_file.bam
you get the error? or when you run idxstats right afterwards?Typically these sorts of errors come from either left-over indexes lying around. Furthermore, chaining commands together like this
is generally considered a bad idea as you combine the process of command generation and command execution into a single operation, making debugging it much harder. Best to just use your cut/grep/xargs to echo out the samtools command rather than actually run it. Then that removes multiple sources of potential error from the question.
I get the error when I run idxstats. Previously I was getting an error when running samtools index, but that has mysteriously stopped occurring.
I'm fairly certain there is something wrong with the bam file or the indexing, and not chaining commands, as I have previously run that exact command with other bed files and gotten results just fine.
right - but this time you don't get results just fine.
I'm not trying to say the chaining of commands was the problem (although it definitely could have been) i'm just saying that in general it is, like, fundamentally leaving you open to more errors to do it that way. Also it's a security risk.