I have exome data from primary tumors and from derived xenografts and I need to compare them. For every variant in the primary tumor, I want to know if it's present in the xenograft and if its frequecy has changed. In addition, I also need to find out if any new variant is present in the xenograft. For this comparison, I can only take into account the regions that have good coverage in the primary tumor and in the xenograft.
So far I've mapped the data (prefiltering murine reads in the xenograft), filtered out the low coverage regions and called variants with samtools in both the primary tumor and the xenograft.
samtools mpileup -u -f genome.fa sample.bam | bcftools view -vcg - > sample.var.raw.vcf
ith this I get the two lists of variants and I can compare them. However, a certain variant can be called, for example, in the xenograft but not in the primary tumor despite being present in 1 or 2 reads.
Is there a way to call ALL the variants in an exome, even the false ones (I want ZERO false negatives even at the cost of thousands of false positives)? I suppose I can parse the mpileup output... but I'd rather prefer not having to do it.
Any idea on a better way to compare the exomes?
Thanks!
Why don't you just `
samtools mpileup -u -f genome.fa xenograft.bam tumour.bam | bcftools view -vcg - > combined.var.raw.vcf`
? I suspect that'd solve most of your problems.I'm trying it right now, but could you explain me how this should solve my problems? How could I use it to identify the variants present in the tumor and not in the xenograft and the variant in the xenograft with not even a single read supporting them in the tumor?
It won't do "not even a single read", but it'll utilize both samples in determining where there are variants, so a variant present in the tumour sample will increase the likelihood of a variant being called in the same position in the xenograft sample even with little evidence. Anyway, I know that you say that you want 0 false negatives, but the results from that wouldn't be very useful, thus my comment. If you really really do want 0 false negatives, then you will indeed need to just parse the mpileup output.
Thanks for your help. As you said, I still find variants with some support in the xenograft not being called. I'll try to play a bit more with the filtering and if necessary I'll parse the mpileup output.