(Warning: newbie question.)
The problem: we have a reference strain (diploid) and three experimental strains derived from it. We have the FASTA sequence of the reference, with both homologs for each chromosome. We also sequenced the three experimental strains (in FASTQ format). We need to know the difference between the experimental strain and the reference strain - what genetic changes did they undergo (if any)?
As I see it, the straightforward way would be to get a list of all SNPs in the reference strain (as a VCF file), then perform SNP calls on each strain, then filter out the reference SNPs from the strain SNPs. I figured out how to do the SNP calls on the strains and I found tools that can filter a VCF against another VCF, but I don't know how to perform the first step - getting a list of SNPs from the reference FASTA. How can this be done? Alternatively, what are other standard ways of discovering mutations?
(There was also a suggestion of comparing the VCFs of the three strains against each other, but that sounds inelegant. What if we only had one derived strain?)
Thanks in advance!
if you have your reference strain as fasta, why don't you use it as your actual "reference" in the analysis. I mean, you can for example align your experimental strain fastq reads to it and then perform a snp calling on this alignment. I don't know which programs did you use, but I think this is always possible and it is the most reasonable approach.
I'm a little confused as to how this would work. When I do a SNP call, I do it against a haplotype, not the full diploid reference. Since there's no special meaning to chromosomes in FASTA (or is there?), wouldn't the homologs be counted as different chromosomes, which would jumble the whole thing?
I see...
Sorry, I didn't think about that.
Talking about the mapping step: having both versions of the chromosomes should not be a problem. if a read has a variation that is present in one of the two alleles, it will be mapped only to that one and no mismatch would be observed. But I'm not an expert on SNP calling, so I will let others comment on that step.