Hello,
I have mapped Illumina reads from two strains of a small eukaryote (say S1 and S2) plus and two reference genomes: G1 (closest to S1 and S2) and sister species G2. It seems that some chromosomes/chromosome parts in almost identical S1 and S2 are "borrowed" from G2, whereas other parts are more G1-like.
Using GATK (or VarScan) I am able call SNPs in S1 and S2 genomic data. Since G1 and G2 are quite close, I want to get G2 SNPs after performing G2 to G1 alignment.
I have MAF file which I can convert to SAM/BAM, but so far did not managed to get SNPs using VarScan or GATK. Parsing MAF and writing VCF would be a big time investment for me, so I am looking for other tools which can do the job. Another, quite likely much easier method would be to get some genomic reads from G2, map them and call SNPs the conventional way. Still, if I already have a G2 genome it should be possible to extract differences (SNPs) in a default (VCF) format.
Thanks for your help.
<del>Don't</del> Do use -B for long query sequences. You can either parse raw mpileup or use bcftools just like what you do with short reads. bwa-mem will be better than bwa-sw, but right now bwa-sw is more stable. EDIT: mistake - please use -B (disable BAQ) for long sequences.
Okay good, I was confused because I was only repeating your advice on another list.