Hi
I'm writing a piece of software to simulate Structural Variations in Genome. So far, I have written a first version with a simple set of features:
- accept a FASTA in input,
- write the new genome into a new FASTA file and the variations into a VCF file,
- simulate SNP,
- simulate indels,
- the frequency of SNP, Insertion and Deletions are configurable.
Before going ahead with new features, I would like to hear from your feedback and recommendations.
I thought that one of the possible improvements (to make it more "real") could be to be able to control the frequency of variants depending on the region of the chromosome: to generate for instance more variants in non-coding regions.
Thanks in adance for any input!
Thanks for your answer. To know if my REF base is whether in intron or exon I plan to use BED files generated from the USCS Table tool. Do you think it is a good idea?
BTW Pierre, in order to generate synonymous SNPs I have to read groups of 3 nucleotides from genome (then see if a base change will be synonymous or not) AND to detect also frame-shift. Is there anything else I should take care of if I decide to implement simulation of sSNP?
you could use the human codon frequencies ?