Hi there,
I am trying to do benchmarking for my pipeline (to analyze WES
and WGS
germline
and generate VCF file for SNV and INDELs
). to do so,
I got the WES data for this sample from hare:
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/
and worked on these 2 datasets separately:
NIST7035_TAAGGCGA_L001_R1_001_trimmed.fastq.gz
NIST7035_TAAGGCGA_L001_R2_001_trimmed.fastq.gz
NIST7086_CGTACTAG_L002_R1_001_trimmed.fastq.gz
NIST7086_CGTACTAG_L002_R2_001_trimmed.fastq.gz
and also the VCF file from the same link as my reference (golden standard):
project.NIST.hc.snps.indels.vcf
then I tried to use the following command to evaluate my VCF file(made for the above files using my pipeline):
java -Xmx4G -jar RTG.jar vcfeval -t Homo_sapiens.GRCh37.GATK.illumina.SDF -T 6 --baseline=[GIAB truth VCF] --calls=[SNV/INDEL VCF] --all-records --bed-regions=[Exome BED file]
I made this folder : Homo_sapiens.GRCh37.GATK.illumina.SDF
using this command:
rtg format --output Homo_sapiens.GRCh37.GATK.illumina.SDF hg19.fasta
as --baseline I used above VCF file (the golden standardnd as --calls I used the VCF file that I made). I also got the bed file from the same link. when I run the RTG.jar using the mentioned command I would get this error:
Error: No sample name provided but baseline is a multi-sample VCF.
do you know how to fix the problem?
Thanks
Hi Sara,
I am also benchmarking my pipeline using the same dataset as you have used. My question to you is, why you have selected project.NIST.hc.snps.indels.vcf as gold standard (Truth VCF) VCF and not HG001_*.vcf.gz (located at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh37/)?
-Akshay
I would contact Len Trigg at RTG: https://www.realtimegenomics.com/products/rtg-tools
You might get answer to your problem in this thread: vcfeval Error: No sample name provided but calls is a multi-sample VCF