Question

Concensus from 1000 genome project

0

Entering edit mode

2.4 years ago

Peerzada • 0

Hello, I want to download Aquaporin 1 Gene sequence for all the 1000 individuals from 1000 genomes project. I have tried a lot . I tried using bcf tools ,vcf tools but it gives me some error . The location for the Aquaporin 1 gene is chromosome 7: 30911853-30925516. I have first downloaded the vcf file for the particular region as :-

bcftools view -Oz -r 7:30911853-30925516 "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr7.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz">aqp1.1000g.vcf.gz
tabix -p vcf aqp1.1000g.vcf.gz

Then I downloaded the reference fasta sequnce from :- http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/ and named as human_ref.fa.gz.

Then I indexed fasta file as:

samtools faidx human_human_ref.fa.gz

and then build each sample's sequence by changing the reference with those variants.

 #!/bin/bash

for sample in `bcftools view -h aqp1.1000g.vcf.gz | grep "^#CHROM" | cut -f10-`; do 
  bcftools view -c1 -Oz -s $sample -o 1000g.$sample.vcf.gz aqp1.1000g.vcf.gz
  tabix -p vcf 1000g.$sample.vcf.gz
  samtools faidx human_ref.fa.gz 7:30911853-30925516 | bcftools consensus 1000g.$sample.vcf.gz -o 
  1000g.aqp1.$sample.fa
done

But this is giving me error as :-

Note: the --sample option not given, applying all records regardless of the genotype
[W::fai_get_val] Reference 7:30911853-30925516 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in 7:30911853-30925516
Applied 0 variants

bcftools 1000genomes • 842 views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 2.4 years ago by Peerzada • 0

0

Entering edit mode

it may be important for the reference sequence names to exactly match e.g. both should say either chr7 or just 7

ADD REPLY • link 2.4 years ago by cmdcolin ★ 4.0k

0

Entering edit mode

I used the chr 7 for both the files and the error now comes as :

Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00111.vcf.gz
Applied 0 variants

Note: the --sample option not given, applying all records regardless of the genotype
Warning: Sequence "chr7" not in 1000g.HG00112.vcf.gz
Applied 0 variants

Error is for all samples

ADD REPLY • link 2.4 years ago by Peerzada • 0