SNP Calling using Genome Analysis Toolkit
0
0
Entering edit mode
4.1 years ago
Fid_o ▴ 40

Greetings,

I want to do SNP analysis (variant call) using the steps below which I got from https://approachedinthelimit.wordpress.com/2015/10/09/variant-calling-with-gatk/

These steps use short reads. But I am working on assembled whole genome bacterial sequences, how do I go about it using assembled whole genome sequences?

Thank you


#Index reference
bwa index reference.fasta

#Sort reference
samtools faidx reference.fasta

#Create sequence dictionary
java -jar ~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict

#Align reads and assign read group
bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:PA01” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam

#Sort sam file
java -jar ~/bin/picard-tools-1.8.5/SortSam.jar I=aln.sam O=sorted.bam SORT_ORDER=coordinate

#Mark duplicates
java -jar ~/bin/picard-tools-version/MarkDuplicates.jar I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt

#Sort bam file
java -jar ~/bin/picard-tools-version/BuildBamIndex.jar INPUT=dedup.bam

#Create realignment targets
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list

#Indel realignment
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam

#Call variants (HaplotypeCaller)
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o raw.vcf

The resulting vcf file will contain your variant calls!

Then you can optionally filter the variants:

#Filter variants
~/bin/vcflib/bin/vcffilter -f ‘DP > 9’ -f ‘QUAL > 10’ raw.vcf > filtered.vcf

Or first split the raw.vcf file into SNPs and indels:

#Extract SNPs
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V raw.vcf -selectType SNP -o snps.vcf

#Extract Indels
java -jar ~/bin/GATK/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V raw.vcf -selectType INDEL -o indels.vcf

SNP genome alignment assembly sequencing • 1.1k views
ADD COMMENT
0
Entering edit mode

I am working on assembled whole genome bacterial sequences

Would you like to tally the differences in the different sequences? I.e. multiple alignments and phylogenetic tree construction? Or something else?

ADD REPLY
0
Entering edit mode

Yes, absolutely that - tally the differences. That is what I want to do - multiple alignments and phylogenetic tree construction

ADD REPLY
0
Entering edit mode

GATK3.3, vcffilter , ... this is a very very old pipeline

ADD REPLY
0
Entering edit mode

Thanks Pierre, would you advise what newer pipeline there is?

ADD REPLY

Login before adding your answer.

Traffic: 2178 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6