Hello, my lab is involved in variety identification of plants. Usually we have over 200 varieties which we barcode during the genotyping by sequencing run. This is primarily a "radseq" method, i.e. it is a reduced representation of a genome, where we do not sequence the whole genome. In the event that a plant species is quite rare, and we do not have a reference genome, we tried to use the program Stacks with a denovo approach. What we saw is that the RADSeq approach for a denovo method is a total no go. Is there any way where we use a "hybrid approach" where we do a WGS or long-read only on one variety and use that as a "pseudo-reference" to just align the stacks for the rest of the varieties and then do a SNP calling between the stacks, independent of the pseudo reference ?
Thanks
It sounds reasonable to me to WGS one and make a full assembly, then call variants on the rest by aligning them to the assembly (which at that point I would call a "reference" rather than "pseudo-reference"). Plant assembly can be difficult, particularly with high ploidies, so don't expect a great assembly... but I would consider that the best approach.
Once you have an assembly, you can also make it more general by creating a consensus using all the alignments from all libraries. This will have the advantage of reducing the size of your VCF files since minor alleles in your assembly will be replaced by major alleles (in the sequenced areas).