I recently started to work with shore
and shoremap
(mainly because my collaborators requested so) in order to pinpoint ems mutations in an Arabidopsis line they created.
While the majority of the protocol runs successful , I'm in doubt about which scoring matrix to use in the shore consensus
step. It seems that there are two available : scoring_matrix_hom.txt
and scoring_matrix_het.txt
. Though I extensively searched the internet and shore manuals and papers I can not seem to find any decent explanation when to use which one. In the manual of shore they mostly use the _hom
one , while in the shoremap manual they use the _het
one (in the shore part of the protocol).
It concerns a backcross analysis here but any info for the outcross approach would also be appreciated.
I would be grateful if someone has any insights on this they want to share.
The paper Plant Genetic Archaeology: Whole-Genome Sequencing Reveals the Pedigree of a Classical Trisomic Line has the following quote:
Then,
scoring_matrix_het.txt
must be optimized to call heterozygous positions.If your reference genome is the backcross strain, use
scoring_matrix_het.txt
, otherwise, usescoring_matrix_hom.txt
. (Or maybe use both and select the heterozygous variants withscoring_matrix_het.txt
and homozygous variants withscoring_matrix_hom.txt
?)thanks for the insights h.mon , very useful !
I was under the impression that since my genome is homozygous I should go for the
_hom
one . but you seem to indicate differently (and making more sense likely) but could you elaborate a little how you come to that conclusion?Ah, it's the SNPs that need (are) to be homozygous and nothing to do with the genome state then?
Would all this be different if you wanted to do an outcross analysis? ( sorry for all the additional questions, I'm just trying to get a complete picture).
Sadly, I can't... I've only studied SHOREmap manuals for a while, but didn't use it yet.
I think the consideration about
het
vshom
depends on the level of differences expected between the lineages and the reference genome. But, as you said, there isn't much information about this in SHOREmap docs, and my inference is from the above paper citation alone. If your reference genome is also the backcross (wild) strain, you would expect to see only heterozygous variants in relation to the reference genome, thusscoring_matrix_het.txt
would be recommended.For ems mutations, one would expect them to be novel in relation the the reference genome, so if your cross scheme aims at getting them at homozygous state, then you should use the
scoring_matrix_hom.txt
.In which other scenarios one would expect mainly homozygous variants in relation to the reference genome? I don't have much experience with these crosses.
For future reference:
I found this "paper" quite useful (though not giving a definite answer to my original question here): SHOREmap_v3.0
unfortunately also not 100% accurate :/