How to set up my vcf with a minimum depth?
1
1
Entering edit mode
5.4 years ago
apl00028 ▴ 90

I am trying to set up the minimum depth which samtools mpileup considers a polymorphism for RNAseq samples. For that I did this:

I did an align with BWA:

bwa mem scaffold_filt_PV014_bwa PV009_1_cmv3.fq_pairs_R1.fastq PV009_2_cmv3.fq_pairs_R2.fastq > scaffold_filt_OD_PV009.sam

I change sam format to bam format and I sorted it:

samtools view -bS scaffold_filt_OD_PV009.sam | samtools sort - > scaffold_filt_OD_PV009.bam

I index the reference:

samtools faidx scaffold_filt_PV014_index.fa

I defined the snp with bcftool:

bcftools mpileup --redo-BAQ --min-BQ 30  --per-sample-mF --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR -f scaffold_filt_PV014_index.fa scaffold_filt_OD_PV009.bam | bcftools call --multiallelic-caller --variants-only -Ov > bcf/scaffold_filt_OD_PV009_q10_variant.vcf

I tried to select those polymorphisms with a depth of 20 or more:

cat scaffold_filt_OD_PV009_q10_variant.vcf  | grep -v "DP>=20" > scaffold_filt_OD_PV009_q10_variant_filter.vcf

But this last step has not any effect. So, What is my mistake? How can I solve it? What other alternatives can I do? Thanks in advantage.

SNP • 3.0k views
ADD COMMENT
1
Entering edit mode

bcftools has various filtering options, please check the manual towards filtering. I strongly recommend to NEVER do any grep or similar things with a VCF, always use dedicated tools for VCF/BCF manipulation, as you might corrupt your file. See for inspiration How to filter vcf file on minimum genotype depth and quality for each sample

ADD REPLY
0
Entering edit mode

I am trying it using this function of that link: vcftools --vcf scaffold_filt_OD_PV009_q10_variant.vcf --out output_prefix --min-meanDP 20 --recode --recode-INFO-all

but I get SNP with lower depth of 20 and filter some SNP with a depth higher than 20.

When I tried use this function: vcftools --vcf scaffold_filt_OD_PV009_q10_variant.vcf --out output_prefix --minDP 20 --recode --recode-INFO-all

It did not work.

What am I doing wrong? Thanks in advantage

ADD REPLY
1
Entering edit mode

For your information, you can avoid intermediate files and save time by piping the output of bwa mem directly to samtools and create a sorted bam file:

bwa mem genome.fa reads.fastq.gz | samtools sort -o sorted_alignment.bam

This requires that your samtools is quite recent, but you should always try to use up-to-date software.

ADD REPLY
1
Entering edit mode
5.4 years ago
apl00028 ▴ 90

Finaly I got my aim using this function:

 bcftools view --types snps --include 'INFO/DP>=20' scaffold_filt_OD_PV009_q10_variant.vcf > scaffold_filt_OD_PV009_q10_variant_filtered.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6