I am not able to understand this code snippet I have, to carry out variant calling, for the SNP Analysis. The snippet is:
bcftools mpileup -Ou -f Ref_1_20040804.fa ref_1.sorted.bam | bcftools call -mv -Ob -o ref_1_btool.bcf
bcftools view ref_1_btool.bcf > pile.vcf
bcftools call -cv ref_1_btool.bcf > variant.bcf
gatk SelectVariants -R ref_1 -V variant.bcf -select-type SNP -O ref_1_snps.vcf**
Here, Ref_1_20040804.fa is the Reference Sequence. The code refers to a pileup.bcf file which I do not have. Any advice on how to proceed would be appreciated.
Hi Kevin. Thank you for the clarification, This makes more sense to me now.
I had picked up the code as I am quite new to this field.
I have another question regarding SNP Analysis. How would you carry out the Base Recalibration and Apply BQSR step for a Bacterial genome.
I am sharing the code I have found, but it deals with Human genome data.
Any advice on how to proceed further would be greatly appreciated.
Thanks!
GATK is not optimal for bacteria IMO. GATK is built and maintained around working with human genomes, so you're better off using other tools. There might be assemblers that work really well with bacteria (they have circular DNA, right?)
Hi, for that, I may direct you to the GATK forum, unless any of my colleagues want to answer. I have not used GATK for a long time.