Entering edit mode
8.3 years ago
vegard nygaard
▴
320
Hi, I am looking for a tool that takes as input a bamfile with aligned reads and the reference genome and outputs a bamfile where every variant (non-reference basecall) is replaced with the reference base call, but the alignment is kept.
I need this in order to de-senitize bam files so I am allowed to distribute them more freely, typically in troubleshooting situations where alignment is more important than variants.
I was not able to find such a tool or option in familiar tools and while writing this I realize it might be a bit more tricky than I thought; what to do with indels?
Feedback appreciated.
It sounds similar to "Create a dummy bam file from a bed coordinates and ref fasta." where the bed coordinates can be obtained from existing bam.
Not only about indels, what will you do with base quality score when you replace with reference ? Especially when you encounter a 'N' in your bam read.
After mapping the base quality doesn't really matter anymore, assuming you are only going to use this edited bam for differential expression analysis...