I have got Illumina whole genome sequence data from a set of samples in BAM format and I have also got genotype data on 90 snps from a Sequenom. What I would like to be able to do is to quickly compare the sequenom data with the sequence data.
As a first step I would like to get the genotype of the positions of the 90 SNPs in the sequence data. Has anyone got an idea of the best approach to do this.
I have been looking at using a combination of samtools mpileup with bcftools but I am not sure if this is appropriate. Problems I'm having:
- It seems to require a reference sequence
- Using a -r in mpileup is very fast but providing the same data in the bed file is very slow.
- I'm not sure that I'm using the correct options as I'm not getting what I would expect.
Command line:
samtools mpileup -u -r 10:106039185-106039185 ../sample_BAMs/ComplexMen/PD7445a.bam | /software/CGP/bin/bcftools view -g -
Output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PD7445a
10 106039185 . N A 222 . DP=75;VDB=0.0404;AF1=1;AC1=2;DP4=0,0,43,32;MQ=60;FQ=-253 GT:PL:GQ 1/1:255,226,0:995
The test bed file:
10 106039185 106039185
Sorry if this is a dumb question, I'm new to this type of stuff.
So in the example would it be AA because it is 1/1? How would it look if you had CT or something like that?
Yes, and because the DP4 shows all the reads are for the alterante letter, and there are none for the reference letter.