Hi,
I'm a new comer for samtools.
Here's the question: we got the raw data (reads) from several people. We are only interested in several genes, say, one of them is FTO. So, we want to assembly a person's FTO gene, not its whole genome. The final goal is to find the SNP of this person on this gene.
By using BWA, we did the mapping the reads to the RefSeq FTO gene, and got a large file called "Sequence.fq.sam", which is 23.7GB. To my understanding, all the aligned reads to this FTO gene should not be so huge.
Anyway, I think the next job is to use samtools.
- Could you give me some instruction about how to use samtools to get the assembly FTO gene of this person?
- How to call the SNP for the assembly FTO gene?
Thank you very much!
Best regards,
clyumath
To get the consensus sequence of a gene (or its SNPs given a reference sequence) you can use a combination of samtools, bcftools, vcfutils (or any other software cited here.
but if your final goal is to get a list of SNPs for more than one gene, my suggestion is to use all of the reads at once.