Hi, I need to generate a custom reference genome (i.e. reference sequence) for a S.cerevisiae strain I use in the lab, which has some polymorphisms. I´ve done the alignments (bowtie, -v 0) using the genome present in SGDatabase as my reference genome and given the polymorphisms, many reads are discarded when they shouldn't.
So far, I have obtained a .bedgraph file that allows me know the coverage in each region. However, I think the value would be higher using a custom reference genome.
Any idea on how to create my custom reference genome? Is it necessary to first call polymorphisms first? If so, how could this be done?
Thanks!!
Are you referring to SNPs? Also using
bowtie v.1.x
is going to use ungapped alignments that could be one reason why you are not getting good alignments. So try replacing that withbwa mem
or a similar aligner. You have the option of doing a reference guided/de novo assembly that should help account for presence of SNPs.Thank you very much! I have tried what you suggested and I now I have a de novo assembly. Could you please explain me how to continue to get a fasta file that could be used as a referece genome? I have just started with bioinformatics and still don't really understand all the process.
Your denovo assembly should be a fasta file that can be used as a reference genome however whether you want to do that will depend on what analysis you want to do and the quality of this assembly
Alternatively you may want to use another high quality assembly for cerevisiae for another strain closer to yours. Yue et al 2017 has several pacbio assemblies of diverse clades
In case you chose the alignment to a reference using
bwa
route you should have an alignment file. You can use the instructions here to call a consensus reference: Generating consensus sequence from bam fileThank you very much! The post you suggested is exactly what I need. However, I'm having trouble when indexing my reference genome as when performing
mpileup
I get a message like this "[fai_fetch_seq] The sequence "CHRII" not found". Any ideas of what can be happening?