What are the recommended tools for the combined genome variation analysis of short sequencing data for several genome strains of a given organism?
I have seen that traditionally people have invested more in trying to deep sequence and assemble one specific strain or, if possible, single individual, to have the reference genome assembly, then done some more sequencing with the money left to assess the variability in the other important strains.
The only case a few years ago that wasn't like this was the Sanger sequencing of several strains of Drosophila simulans, all low coverage, that were pooled and used to define the simulans genome reference.
If one takes the approach of doing the same amount of sequencing for a group of strains without an existing reference genome, what would be the best tools to assess the genomic variability in the group of strains?
EDIT: for example, in this paper for a cattle pathogen, the authors did the resequencing of 10 strains for a species that already had a reference genome. They did a very sound variation analysis by comparing the results of the 10 resequenced strains mapped to the reference. My question is: what tools would someone use in the case where the sequenced 10 strains where for a species without a reference genome?
I have the feeling that you would need to compile a reference first.
is what you asking is resemble to Metagenomics? but instead of collection samples directly from environment you are talking about sequencing them from culture media in the lab with out reference genome and comparing them?