Joining Snpeff And Vcf
2
0
Entering edit mode
12.8 years ago

I've extracted the distinct mutations from a set of VCF files:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    66480    .    AT    A    205    .    (...)
chr1    626686    .    CCT    C    124    .    (...)

and generated the predictions using SnpEff:

(...)
1    66481    *    -T    DEL    Hom    205    0        OR4F5.1    OR4F5    mRNA    NM_001005484            UPSTREAM: 2610 bases            
1    626687    *    -CT    DEL    Hom    124    0        OR4F29.1    OR4F29    mRNA    NM_001005221.2            UPSTREAM: 4652 bases        
1    626687    *    -CT    DEL    Hom    124    0        OR4F16.1    OR4F16    mRNA    NM_001005277.2            UPSTREAM: 4652 bases

now I'd like to join those results with my VCFs. But, as you can see, SnpEff change the way the alternate bases are defined. Do you know any way to join those files ?

Thanks.

EDIT:

here is my temporary C++ solution: it converts the VCF to SnpEff:

    (...)
    while(getline(in,line,'\n'))
     {
     if(line.empty()) continue;
     if(line[0]=='#')
         {
         cout << "#Chromo\tPosition\tReference\tChange\t" << line << endl;
         continue;
         }
     tokenizer.split(line,tokens);
     string chrom=tokens[0];
     if(chrom.compare(0,3,"chr")==0) chrom=chrom.substr(3);
     int pos;
     numeric_cast<int>(tokens[1].c_str(),&pos);
     string ref=tokens[3];
     string alt=tokens[4];
     if(ref.size()<alt.size()) /* AC/A = DELETION */
         {
         assert(ref[0]==alt[0]);
         ref.assign("*");
         alt[0]='+';
         ++pos;
         }
     else if(ref.size()>alt.size())
         {
         assert(ref[0]==alt[0]);
         alt.assign(ref);
         alt[0]='-';
         ref.assign("*");
         ++pos;
         }
     else
         {
         //single SNP
         }
     cout     << chrom
        << "\t"<< pos
        << "\t" << ref
        << "\t" << alt
        << "\t" << line
        << endl;
     } (...)
format vcf • 4.7k views
ADD COMMENT
0
Entering edit mode

hi everyone. help me!

# run VarScan > rice-snp.vcf
# and I want run with snpEff but error!

./snpEff$ java -jar snpEff.jar rice7 rice-snp.vcf > s.eff.vcf

ERRORS: Some errors were detected
Error type      Number of errors
ERROR_CHROMOSOME_NOT_FOUND      330650

NEW VERSION!

        There is a new SnpEff version available:
                Version      : 3.6
                Release date : 2014-04-21
                Download URL : http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip

# thanks.

# format file input.vcf

CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample1
LOC_Os01g01070    1254    .    A    G    .    PASS    ADP=13;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    3850    .    A    G    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    4240    .    C    T    .    PASS    ADP=12;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01080    2809    .    T    C    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01090    435    .    G    A    .    PASS    ADP=15;WT=0;HET=1;HOM=0;NC=0    
.......
ADD REPLY
0
Entering edit mode

Don't put your question as an answer, it will be deleted.

Ask it as a new question

ADD REPLY
4
Entering edit mode
12.8 years ago
Pablo ★ 1.9k

One solution is to use VCF output (as suggested in other answer) and then split one effect per line using the vcfEffOnePerLine.pl script that you can find in the 'scripts' directory of SnpEff's distribution.

ADD COMMENT
0
Entering edit mode

thanks , vcfEffOnePerLine was missing from my distribution.

ADD REPLY
2
Entering edit mode
12.8 years ago

Why not use the SnpEff option to report it's predictions in VCF format? This way, there will be no need to join?

 java -Xmx4G -jar snpEff.jar eff -i vcf -o vcf GRCh37.63 sample.vcf > sample.annotated.vcf
ADD COMMENT
0
Entering edit mode

because I want to keep one type of mutation per lines. SnpEff puts all the possible effects in the INFO column.

ADD REPLY
0
Entering edit mode

Ah, I see. It might be easier to just use awk to "expand" each VCF line for each annotation in the INFO field. You're right, the change in allele definition is irksome.

ADD REPLY

Login before adding your answer.

Traffic: 2312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6