Question

Error using Tabix

0

Entering edit mode

6.1 years ago

AS • 0

Hi, I am trying to use vcf files downloaded from public database (NCBI SRA Bioproject:PRJNA486011). I am using only R to do that because I don’t know any other language. I have put the gz and the gz.tbi in the same folder.

>vcf.fl <- list.files("path/to/vcfs", full.names=TRUE, pattern=".vcf.gz$")
> tabx <- TabixFile("chr4_eva.vcf.gz", index=paste("chr4_eva.vcf.gz", "tbi", sep="."),yieldSize=10000)
> tabx
class: TabixFile 
path: chr4_eva.vcf.gz
index: chr4_eva.vcf.gz.tbi
isOpen: FALSE 
yieldSize: 10000 

> names(tabx)

 [1] ".self"        ".->path"      ".extptr"      ".->yieldSize" ".refClassDef" ".->index"     ".->.extptr"   "index"       
 [9] "path"         "yieldSize"   

> vcf.rng <- readVcf(tabx)

Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

vcf tabix R • 2.7k views

ADD COMMENT • link updated 6.1 years ago by dariober 15k • written 6.1 years ago by AS • 0

1

Entering edit mode

Probably one file has notations ike 1,2,3...X,Y and your has chr1,chr2... Check that.

ADD REPLY • link 6.1 years ago by ATpoint 87k

0

Entering edit mode

I succeed to check the data of the .vcf.gz but I haven't found a way to look inside the tbi. Could you please indicate me a way to do it ?

ADD REPLY • link 6.1 years ago by AS • 0

0

Entering edit mode

You have to compare your VCF against the VCF you downloaded, not the index. Please show a subset of your VCF and of the downloaded VCFs.

ADD REPLY • link 6.1 years ago by ATpoint 87k

0

Entering edit mode

Why don't you index them quickly by yourself - this is very simple:

tabix -p vcf <file.name>

ADD REPLY • link 6.1 years ago by JJ ▴ 750

0

Entering edit mode

Thanks for the answers.

I have downloaded the vcf from : ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz. The size of the download file is ok compare to the size indicated on the website.

I have tried to look at the file and I found strange that the ID seem full of NA and no samples are detected.

VCFA <- read.vcfR("chr8_eva.vcf.gz")
VCFA
***** Object of Class vcfR *****
0 samples
1 CHROMs
1,294,347 variants
Object size: 294.3 Mb
NaN percent missing data
*****        *****         *****

head(VCFA)

[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##GATKCommandLine.SelectVariants=<ID=SelectVariants,CommandLineOptio [Truncated]"
[1] "##source=SelectVariants"
[1] "##FILTER=<ID=LowQual,Description=\"Low quality\">"
[1] "##FILTER=<ID=my_snp_filter,Description=\"QD < 2.0 || FS > 60.0 || MQ < 30.0\">"
[1] "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths fo [Truncated]"
[1] "First 6 rows."
[1] 
[1] "***** Fixed section *****"
     CHROM POS  ID REF ALT QUAL    FILTER         
[1,] "8"   "6"  NA "A" "G" "46.53" NA             
[2,] "8"   "13" NA "A" "G" "92.17" NA             
[3,] "8"   "14" NA "A" "C" "29.71" "my_snp_filter"
[4,] "8"   "18" NA "T" "C" "71.66" NA             
[5,] "8"   "20" NA "A" "G" "55.56" NA             
[6,] "8"   "27" NA "A" "G" "55.15" NA             
[1] 
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1] 
[1] "Unique GT formats:"
[1] "No gt slot present"
[1]

ADD REPLY • link updated 6.1 years ago by finswimmer 16k • written 6.1 years ago by AS • 0

score 0 · Answer 1 · 2019-02-03

The vcf file you got from ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz seems defective since its header does not contain the contig "8". As a quick and dirty fix you can add the missing contig with something like:

gunzip -c chr8_eva.vcf.gz | grep '^##' > chr8_eva.fix.vcf
echo "##contig=<ID=8>" >> chr8_eva.fix.vcf 
gunzip -c chr8_eva.vcf.gz | grep -v '^##' >> chr8_eva.fix.vcf
bgzip chr8_eva.fix.vcf

Now it should work:

VCFA <- readVcf("~/Downloads/chr8_eva.fix.vcf.gz") ## Seems ok

# Original vcf throws warning:
VCFA <- readVcf("~/Downloads/chr8_eva.vcf.gz")
Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)