Hi,
I plan to use bamCoverage from Deeptools to get bw files, but it looks like the thread is dead as it never finishes (no errors). I have hundred of thousands of short unlocalized/random contigs in bam files and looks like this can be the reason bamCoverage fails (https://github.com/deeptools/deepTools/issues/662).
Using samtools, I removed all the reads except those aligned to chromosomes 1:22 (samtools view -o out.bam in.bam
seq 1 22 | sed 's/^/chr/') but of course the tags related to the removed contigs are still available at bam headers and so the problem with bamCoverage persists. Is it reasonable to just naively modify the bam headers (eg samtools reheader) so that only headers related to main chromosomes are retained?
I tried
samtools reheader newheader.txt out.bam > newbam.bam
but the resulting files looks corrupted.
Probably thats the case. What x.bam refers to in your example? Here is what I did: Retained only reads aligned to chromosomes 1:22:
samtools view -o out.bam in.bam seq 1 22 | sed 's/^/chr/'
Took the header:
samtools view -H out.bam > header.sam
The header.sam still has the '@'SQ line for all the removed contigs. So, manually copy pasted the '@'SQ lines related to chr 1:22 and '@'PQ lines (newheader.sam) and used it for samtools reheader:
Based on your response, the manual modification of header file could be reason. In your example, how x.bam doesnt include the headers from the removed contigs?
Your approach looks fine to me. Maybe just paste the before and after headers (without all the
@SQ
lines to see if someone sees a mistake. The x.bam is the old_bam.bam, I renamed it.How do you know the new bam is corrupt ? Just run samtools index on it to see.
Exactly! I know its corrupted as I cant run samtools index:
This link might be helpful ...
https://github.com/samtools/samtools/issues/1118