I have two different samples, say WT (wildtype) and A6 (Crispr mutated). (samples are illumina sequenced target panel, which includes 125 gene ) I want to to find the variants in A6 with respect to WT (considering WT as reference). I did following steps:
step1: Aligned the WT FASTQ files to hg19 using BWA-mem
step2: Called variants in WT using GATK-haplotype caller.
step3: put the list of called variants in hg19 to generated the consensus sequence (used GATK -FastaAlternateReferenceMaker) (I have checked for heterogeneous biallelic mutations, they were very few, so I did not do anything different for them, and let the tool decide which one to select.)
step4: using the consensus sequence as reference sequence, aligned both the samples A6 & WT (used WT to check if there are still some variants being called) and called variants (used BWA & GATK)
Results:There are still so many variants called in WT, also there is a large overlap in the variants from WT & A6.
Can any one suggest any modification in strategy or something else?
Thanks
Would it not be easier to align both samples to hg19 and then just compare variants?
thanks for looking in to this Ram, (I have more samples to compare) I already have done that, but it messes us when I compared 3 samples. especially when i transfer the variants from WT to other samples to make that reference (i.e. without any variants). (may be i should have used set-theory operations more smartly)
Could you elaborate on this procedure please? I was talking about something that'd exclude all WT variants from consideration and pick only variants unique to the CRISPR-mutated sample under study. Perhaps the tumor-normal approach used in cancer NGS can help?
my approach: Variants which are overlapping can be thrown, but the variants exclusive to WT, should be transferred to samples which are being compared. Tumor-normal strategy is good idea, I will try that. Thanks Ram.
That's straightforward, you don't need to build a new reference genome.
Sorry, what? You're comparing WT to CRISPR-mutated samples and transferring variants not found in those mutated samples to them? Why is that?
I am comparing the A6 with WT, and trying to find, how much is A6 different from WT. So, the exclusive variants to WT, which are not present in A6 can also be counted as difference. And hence,transferring the variants (just numbers).
But exactly this can be done - as Ram said before - by doing alignment and variant calling for each sample, and than count the position where are differences in genotyp between A6 and WT.
thanks fin, I will redo the comparison.
Please do not use all caps, it is rude.
corrected (and my apologies)