Question

Joining Snpeff And Vcf

0

Entering edit mode

12.8 years ago

Pierre Lindenbaum 164k

I've extracted the distinct mutations from a set of VCF files:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    66480    .    AT    A    205    .    (...)
chr1    626686    .    CCT    C    124    .    (...)

and generated the predictions using SnpEff:

(...)
1    66481    *    -T    DEL    Hom    205    0        OR4F5.1    OR4F5    mRNA    NM_001005484            UPSTREAM: 2610 bases            
1    626687    *    -CT    DEL    Hom    124    0        OR4F29.1    OR4F29    mRNA    NM_001005221.2            UPSTREAM: 4652 bases        
1    626687    *    -CT    DEL    Hom    124    0        OR4F16.1    OR4F16    mRNA    NM_001005277.2            UPSTREAM: 4652 bases

now I'd like to join those results with my VCFs. But, as you can see, SnpEff change the way the alternate bases are defined. Do you know any way to join those files ?

Thanks.

EDIT:

here is my temporary C++ solution: it converts the VCF to SnpEff:

    (...)
    while(getline(in,line,'\n'))
     {
     if(line.empty()) continue;
     if(line[0]=='#')
         {
         cout << "#Chromo\tPosition\tReference\tChange\t" << line << endl;
         continue;
         }
     tokenizer.split(line,tokens);
     string chrom=tokens[0];
     if(chrom.compare(0,3,"chr")==0) chrom=chrom.substr(3);
     int pos;
     numeric_cast<int>(tokens[1].c_str(),&pos);
     string ref=tokens[3];
     string alt=tokens[4];
     if(ref.size()<alt.size()) /* AC/A = DELETION */
         {
         assert(ref[0]==alt[0]);
         ref.assign("*");
         alt[0]='+';
         ++pos;
         }
     else if(ref.size()>alt.size())
         {
         assert(ref[0]==alt[0]);
         alt.assign(ref);
         alt[0]='-';
         ref.assign("*");
         ++pos;
         }
     else
         {
         //single SNP
         }
     cout     << chrom
        << "\t"<< pos
        << "\t" << ref
        << "\t" << alt
        << "\t" << line
        << endl;
     } (...)

format vcf • 4.7k views

ADD COMMENT • link updated 10.6 years ago by tranthach90 • 0 • written 12.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

hi everyone. help me!

# run VarScan > rice-snp.vcf
# and I want run with snpEff but error!

./snpEff$ java -jar snpEff.jar rice7 rice-snp.vcf > s.eff.vcf

ERRORS: Some errors were detected
Error type      Number of errors
ERROR_CHROMOSOME_NOT_FOUND      330650

NEW VERSION!

        There is a new SnpEff version available:
                Version      : 3.6
                Release date : 2014-04-21
                Download URL : http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip

# thanks.

# format file input.vcf

CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample1
LOC_Os01g01070    1254    .    A    G    .    PASS    ADP=13;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    3850    .    A    G    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    4240    .    C    T    .    PASS    ADP=12;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01080    2809    .    T    C    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01090    435    .    G    A    .    PASS    ADP=15;WT=0;HET=1;HOM=0;NC=0    
.......

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 10.6 years ago by tranthach90 • 0

0

Entering edit mode

Don't put your question as an answer, it will be deleted.

Ask it as a new question

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 10.6 years ago by Istvan Albert 102k

score 4 · Answer 1 · 2012-02-22

4

Entering edit mode

12.8 years ago

Pablo ★ 1.9k

One solution is to use VCF output (as suggested in other answer) and then split one effect per line using the vcfEffOnePerLine.pl script that you can find in the 'scripts' directory of SnpEff's distribution.

ADD COMMENT • link 12.8 years ago by Pablo ★ 1.9k

0

Entering edit mode

thanks , vcfEffOnePerLine was missing from my distribution.

ADD REPLY • link 12.8 years ago by Pierre Lindenbaum 164k

score 2 · Answer 2 · 2012-02-22

2

Entering edit mode

12.8 years ago

Aaronquinlan 12k

Why not use the SnpEff option to report it's predictions in VCF format? This way, there will be no need to join?

 java -Xmx4G -jar snpEff.jar eff -i vcf -o vcf GRCh37.63 sample.vcf > sample.annotated.vcf

ADD COMMENT • link 12.8 years ago by Aaronquinlan 12k

0

Entering edit mode

because I want to keep one type of mutation per lines. SnpEff puts all the possible effects in the INFO column.

ADD REPLY • link 12.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Ah, I see. It might be easier to just use awk to "expand" each VCF line for each annotation in the INFO field. You're right, the change in allele definition is irksome.

ADD REPLY • link 12.8 years ago by Aaronquinlan 12k