Variant calling and downstream analysis
1
0
Entering edit mode
7.5 years ago

Hi,

I have whole genome sequencing data of 1605 bacterial samples in paired-end reads fastq format. These samples come from 3 phenotypes. I have also a reference sequence fasta file. I have aligned 1600 samples to the reference genome and then I passed bam files to samtools mpileup:

samtools mpileup -g -f "ref.fa" -o mpileup.bcf -b <list of="" all="" 1605="" bam="" files="">

Then I call variants using this command: bcftools call -v -m --ploidy 1 -O b -f gq,gp -o variant.bcf mpileup.bcf

Now I have a huge bcf file containing 1605 samples and 312924 variant sites. How I can find variants that are specific to each phenotype? There are three phenotypes across 1605 samples.

SNP next-gen snp • 2.1k views
ADD COMMENT
2
Entering edit mode
7.5 years ago

Once you have VCF files, you can use BBMap's comparevcf.sh like this:

comparevcf.sh in=a.vcf,b.vcf,c.vcf out=a_minus_bc.vcf subtract

That will give you variants unique to A.

ADD COMMENT

Login before adding your answer.

Traffic: 1075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6