I found this post but I'm still not sure how to do this: vcf from sorted bam
Here's my genome file and index:
../../db/genomes/human/GRCh38.p13/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
../../db/genomes/human/GRCh38.p13/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai
Here's my bam file:
201858321_S1.trimmomatic.GCA_000001405.15_GRCh38_no_alt_analysis_set.sorted.bam
Here's my SNPs file:
hg38.snp147.bed
How can I generate a VCF file that uses annotations from the hg38.snp147.bed
and my bam
file?
Are you familiar with the process of variant calling?
A bit. I've only really done it for prokaryotic organisms with
snippy
. Right now I'm trying the following:source activate vcftools_env && pv 201858321_S1.trimmomatic.GCA_000001405.15_GRCh38_no_alt_analysis_set.sorted.bam | samtools view -h | bcftools mpileup -f ../../db/genomes/human/GRCh38.p13/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads ${N_JOBS} - | gzip -c > 201858321_S1.trimmomatic.GCA_000001405.15_GRCh38_no_alt_analysis_set.sorted.vcf.gz
I thought that
pv
would give me a progress bar (it's not) or else I wouldn't have usedstdin
forbcftools
.I understand the
mpileup
will generate avcf
file but I'm wondering if there is a way to include annotations forSNPs
so I can getrs
identifier flags? If not my plan was to convert to bed file and then compare bed files.I don't really understand the difference between the different VCF files on slide 2 here: https://www.ebi.ac.uk/sites/ebi.ac.uk/files/content.ebi.ac.uk/materials/2014/140217_AgriOmics/dan_bolser_snp_calling.pdf