Hello everyone,
Currently, I'm working on a ChIPseq dataset where I will analyze chromatin marks on transposons and genes in a fungus. Unfortunately, I got some contamination in my data from a closely related species. Because they are so similar, removing contamination based on alignment quality is very unlikely to work since the differences are so small. The differences mostly consist of a single or a few nucleotide(s). With that in mind, we realized that searching for these locations could be treated as looking for a single nucleotide polymorphism. The problem here is that while there are many good tools to find SNPs, I cannot find anything that could remove reads containing one from my BAM file. Does anyone know of a tool that could do this? Or alternatively, is there another way to tackle this problem?
Thanks in advance for any help here. It's been nagging for a while.
Hi Pierre,
Thank you for providing the script. I installed it and ran the test command which works fine, but on my own data, I am experiencing an issue:
It looks like something is wrong with my input. I'll describe how I got it, and let me know if there's anything I should do differently.
The BAM files were obtained by aligning with BWA-MEM and removing the duplicates with picard MarkDuplicates. They are sorted and indexed. The VCF files were generated with bcftools mpileup for specific regions of interest. They were bgzipped (with index created) ad then indexed using tabix. No other parameters were tweaked.
how _strange_ ... please, what is the output of
and
please.
Please, use https://github.com/lindenb/jvarkit/issues for other questions.
ah no ! got it ! I forgot to test if the read was unmapped and all the reads are mapped in my test file. Give me a few minutes...
... and done ! I fixed the bug, can you please update the code and tell me if it works ?
It works now. To test the result, I quickly generated a new vcf file from the resulting BAM file. It does still contain indels, but they are rare and should not affect the results too much. All SNPs are gone. Thank you very much for the tool.