Hi all,
I have a problem with reheadering a bam file which I subset before. I mapped against a merged genome of human and mouse (since it is a patient derived xenograft sample) and want to remove residual mouse reads. I did this with samtools (version 1.8) by simply subsetting for the human chromosomes.
samtools view -b mybam.bam chrM chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > no-mouse.bam
I want to remove the mouse chromosomes from the header by removing the respective lines from the header with sed (there are also some ERCCs etc. in the reference, which is why I have to remove so many lines) and reheadering the bam afterwards.
samtools view -H no-mouse.bam | sed '27,253d' | samtools reheader - no-mouse.bam > no-mouse.rehead.bam
I end up with a truncated bam file, if I want to look at e.g. the end of the file with
samtools view no-mouse.rehead.bam | tail
I get
[main_samview] truncated file.
Following steps (e.g. AddOrReplaceReadGroups
) give errors as well.
I've seen some posts on similar topics but didn't find any helpful solution unfortunately.
Can someone help me with that? Thanks a lot!
Do you have a reference that this is good practice and reliable? I could imagine that because of homologies between the species, you'll get plenty of multimappers.
You can probably find a reference in the allele-specific literature, though we also use this strategy in the WGBS world.
In general, if you know your sample is a composite of a couple organisms then concatenating them minimizes the bias.
Ok, never heard of this strategy. Thanks!
Here some other things that didn't do the job either:
I get the following error when I start any Picard tools (e.g. ReorderSam, AddOrReplaceReadGroups), maybe this might help to find the problem:
The numbers vary in different files, I also got '185' and others.
Hope somebody can help me! Thanks in advance!
Are you sure that your header order of chromosomes really is the same order as no-mouse.bam?
That's a good hint, thanks! If I look at the bam file the first reads I get are from chr1, whereas the header starts with chrM. But I don't really know how to fix this. I'll try ReorderSam on the subsetted file with the old header, maybe that'll work (reordering the subsetted file with the new header doesn't).
See my
cat
solution in the comment below, it should work without needing to useReorderSam
.