Question

BCFtools consensus

0

Entering edit mode

2.4 years ago

Peerzada • 0

I am trying to incorporate variants from multisample vcf file using bcftools consensus as :-

First download variant and reference files as

bcftools view -Oz -r 7:30911853-30925516 "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz">aqp1.1000g.vcf.gz

tabix -p vcf aqp1.1000g.vcf.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

samtools faidx GRCh38_reference.fa.gz

and then using for each sample .

#!/bin/bash
for sample in `bcftools view -h aqp1.1000g.vcf.gz | grep "^#CHROM" | cut -f10-`; do
  bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz aqp1.1000g.vcf.gz
  tabix -p vcf 1000g.$sample.vcf.gz    
  samtools faidx GRCh38_reference.fa.gz chr7:30911853-30925516 | bcftools consensus 1000g.$sample.vcf.gz -o 1000g.aqp1.$sample.fasta
done

I am getting error as :-

Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00096.vcf.gz
Applied 0 variants
Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00097.vcf.gz
Applied 0 variants
Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00099.vcf.gz
Applied 0 variants

Kindly help.

bcftools 1000genomes • 3.1k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 2.4 years ago by Peerzada • 0

0

Entering edit mode

this is a repost of your previous question at Concensus from 1000 genome project and the same answer that I provide in that question applies here: you need to make sure that both data files use the same chromosome names. the error message is telling you that your file 1000g.HG00099.vcf.gz does not have the sequence name chr7

ADD REPLY • link 2.4 years ago by cmdcolin ★ 4.0k

0

Entering edit mode

Where in vcf file should I change the chromosome name to chr7 as it is multisample vcf file and what about the error sample option not given.

ADD REPLY • link 2.4 years ago by Peerzada • 0

0

Entering edit mode

I wrote an answer to this post with an example

ADD REPLY • link 2.4 years ago by cmdcolin ★ 4.0k

score 0 · Answer 1 · 2022-07-17

0

Entering edit mode

2.4 years ago

cmdcolin ★ 4.0k

This script should do what you want. the only extra step i did was remove the chr prefix from the reference names from the FASTA with sed (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa), and then run your code pretty much unmodified

#!/bin/bash
bcftools view -Oz -r 7:30911853-30925516 "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz">aqp1.1000g.vcf.gz

tabix -p vcf aqp1.1000g.vcf.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

samtools faidx GRCh38_full_analysis_set_plus_decoy_hla.fa

sed -e 's/^>chr/>/' GRCh38_full_analysis_set_plus_decoy_hla.fa > out.fa

samtools faidx out.fa

for sample in `bcftools view -h aqp1.1000g.vcf.gz | grep "^#CHROM" | cut -f10-`; do

  bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz aqp1.1000g.vcf.gz

  tabix -p vcf 1000g.$sample.vcf.gz

  samtools faidx out.fa 7:30911853-30925516  | bcftools consensus 1000g.$sample.vcf.gz -o 1000g.aqp1.$sample.fasta
done

ADD COMMENT • link 2.4 years ago by cmdcolin ★ 4.0k

0

Entering edit mode

I would also ask: what is your end goal?

ADD REPLY • link 2.4 years ago by cmdcolin ★ 4.0k

0

Entering edit mode

Thank you Sir for your response .I have to see the variation in the aqp1 gene in all thousand individuals and find that where the variation is located and How is it effecting the protein function . I got the fasta sequence incorporated with variants of the each individual. So how can I proceed further with the fasta files . Should I translate them and find out intronic and exonic region particular variant position . Kindly suggest ..

ADD REPLY • link 2.4 years ago by Peerzada • 0

0

Entering edit mode

you can use a tool such as variant effect predictor (https://uswest.ensembl.org/info/docs/tools/vep/index.html) or other variant effect predictors and put the VCF data into their web form. it may help to cut out all the samples e.g. zcat aqp1.1000g.vcf.gz|cut -f1-9 > out.vcf here are the results I generated with this https://uswest.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=w5X12mMSXbIWMepd-8458905 with that exact command

ADD REPLY • link 2.4 years ago by cmdcolin ★ 4.0k