Entering edit mode
6.8 years ago
Kritika
▴
270
Hi I have 6 samples WGS. I am asked to do snps calling among sample instead of comparing it with Reference genome. I want to find how similar are the samples . Usually we do SNP calling on comparing it with Reference (if available )so here the comparison has to be made among samples. Shall i need to go for Denovo assembly If yes then what will be next procedures
Also i have 11GB RAM and 1TB space so how much time it will be require for WGS of 30X data of 2.0GB genome
Clarify with the person asking you to do things what they mean by their query, what the ultimate aim is, etc. Gain a better understanding of why you're doing what you do.
It sounds like de novo assembly, but try and find out what exactly and more importantly, why.
Hi @Ram Actually the study is WGS snp analysis but they are interested in only particular chromsomes. What they are quering is they want sample by sample comparison instead sample by ref comparison. so if they get any SNPs it should not be related to reference it should be related to sample. For example if i treat my sample 1 as reference and sample 2 as query then comparing the snps got from this with other sample.
The closest I can think of is either tumor-normal analysis or de-novo/transmission analysis, both of which are analyses performed on top of calling SNVs. Using one sample as the ref is not advisable, as sequencing errors can cause a lot of noise. Ref sequences are reference for a reason - they have been well validated.
Also,
Use an interval list
That was my question - why?
Actually this sample is Rice sample WGS snp analysis . My client want to compare sequence of sample 1 and sample 4 (Not to be mapped with reference) take that snps compare with sample 2 ,sample 3,sample 5,sample 6 which is been mapped with reference
You're telling me the what, I'm asking you about the why. Have you discussed with your client why they want to do this (their ultimate aim, as I mentioned earlier)? Unless this is common practice in plant bioinformatics, I do not think this approach makes sense.
They want to see differences among the samples how much variation is there among the samples
And why can you not get to that by comparing each to the standard reference sequence?
they want one with reference and with sample vs sample comparision My with reference SNPs comparison for all samples has already been given to them but now they want sample vs samples comparison
Explore using bcftools. The closest analysis that you can do, that would make sense here, is variants(X)-variants(A), where X and A are 2 samples, and variants(X), variants(A) is the set of all variants found in X and A respectively.
If I were you, I'd talk to them and tell them how their request doesn't make sense, as comparing to something you built just reinforces errors and introduces biases.
They are beliving that this sample could be differing from standard reference
a. Do not add an answer unless you're answering your original question b. They are free to believe what they want to, but unless they are bioinformatics experts as well as clients, they cannot tell you both what to do _and_ how to do it. Every sample analyzed ever differs from the standard reference, so comparing samples = comparing the way the samples differ from the reference, not aligning one sample from your dataset to another. That makes absolutely no sense.