Extracting this data frame from a .vcf file
3
1
Entering edit mode
5.8 years ago
zizigolu ★ 4.3k

Hi,

I have one .vcf file of whole genome sequencing of tumour Vs normal samples of 21 patients.

I need a data from like this as input for a tool for finding driver genes

> head(mutations)
  sampleID chr      pos ref mut
1 Sample_1   1   871244   G   C
2 Sample_1   1  6648841   C   G
3 Sample_1   1 17557072   G   A
4 Sample_1   1 22838492   G   C
5 Sample_1   1 27097733   G   A
6 Sample_1   1 27333206   G   A

In separated .vcf files for each patient I have start, end, chromosome, ref, and variant allele. However I am sure how to get such data frame from this big vcf

Any help please?

Thank you

WGS R VCF • 3.7k views
ADD COMMENT
2
Entering edit mode

This is a basic question, please invest some time to read through bcftools manuals. Or if you choose to stay in R, then read about vcfR package.

ADD REPLY
0
Entering edit mode

Thank you I also tried vcfR

> read.vcfR("trg.snp.pass.vcf")
Error in read.vcfR("trg.snp.pass.vcf") : 
  File: trg.snp.pass.vcf does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
> read.vcfR("trg.snp.pass.vcf.tar")
Error in read.vcfR("trg.snp.pass.vcf.tar") : 
  File: trg.snp.pass.vcf.tar does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf.tar 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
>
ADD REPLY
0
Entering edit mode

bcftools query plugin and snpsift plugin in galaxy also do that

ADD REPLY
5
Entering edit mode
5.8 years ago
zx8754 12k

Using bcftools:

bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' myFile.vcf > myFileLong.txt
ADD COMMENT
0
Entering edit mode

Thank you,

says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' trg.snp.pass.vcf > myFileLong.txt
-bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Failed to open trg.snp.pass.vcf: unknown file type

And when I tried for .vcf for one sample says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' LP2000104-DNA_A01_vs_LP2000101-DNA_A01.passed.somatic.indel.vcf > myFileLong.txt
bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Error: no such tag defined in the VCF header: FORMAT/GT
ADD REPLY
1
Entering edit mode

bash: [fi1d18@cyan01: command not found

Your command line doesn't start with bcftools. The first thing that is trying to start is [fi1d18@cyan01. Make sure there are no more symbols before the command you like to start.

fin swimmer

ADD REPLY
0
Entering edit mode

Either provide full path for bcftools or add the directory with that executable to your $PATH. export PATH=$PATH:/dir_for_bcftools

ADD REPLY
0
Entering edit mode

Sorry, I am in path but either galaxy or in Linux I am getting this error

Error: no such tag defined in the VCF header: FORMAT/GT

and galaxy says

Fatal error: Exit code 255 ()
Error: no such tag defined in the VCF header: INFO/REFt. FORMAT fields must be in square brackets, e.g. "[ REFt]"

The head of my vcf is this

##bcftools_viewCommand=view -h c.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

I don't know what is going wrong in my vcf files though

ADD REPLY
0
Entering edit mode

Are you sure that this is the complete header?

bcftools is very strict about the vcf specs. So the first line must be:

##fileformat=VCFv4.1

(Version number can differ)

For each contig you need an entry like this:

##contig=<ID=chr1,length=248956422>

For each key in the INFO and FORMAT column you need in entry in the header. For GT this looks like this:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

So, are there more entry in the header?

fin swimmer

ADD REPLY
3
Entering edit mode
5.8 years ago
zizigolu ★ 4.3k
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' c.vcf

This solve the error

bcftools query plugin and snpsift plugin in galaxy also do that

ADD COMMENT
1
Entering edit mode

Great it worked out, accept it if this was the solution.

ADD REPLY
0
Entering edit mode

%END would return the start and end

ADD REPLY
2
Entering edit mode
5.8 years ago

GATK has a tool for that, see VariantsToTable

ADD COMMENT

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6