Entering edit mode
6.9 years ago
Phismil
▴
20
Dear Colleagues, I have a set of massive random shot gun sequencing data from few organism without reference genome. I would like to ask your kind expert suggestions if you know a pipeline for aligning all against all for finding "stacks" of aligned homologous reads. They have been generated randomly without restriction site anchors which make using typical RAD seq pipelines complicated. Many thanks in advance
Is this whole genome data? Since you labeled the post with
snp
is that what you are ultimately trying to do? Have you thought about reducing redundancy before doing MSA?Hi there, Thanks for the replay. Yes the goal is ultimately assigning SNPs. Sorry for silly question: I really don't have much idea about reducing redundancy !javascript:document.forms["comment-form"].submit()
Identical sequences don't contribute information to a MSA so you could remove redundancy (in terms of sequence identity) from your data if a pure MSA was your aim. Since you are interested in SNP calling it may not be advisable. Someone else may have specific suggestions for you.