I've extracted the distinct mutations from a set of VCF files:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr1 66480 . AT A 205 . (...)
chr1 626686 . CCT C 124 . (...)
and generated the predictions using SnpEff:
(...)
1 66481 * -T DEL Hom 205 0 OR4F5.1 OR4F5 mRNA NM_001005484 UPSTREAM: 2610 bases
1 626687 * -CT DEL Hom 124 0 OR4F29.1 OR4F29 mRNA NM_001005221.2 UPSTREAM: 4652 bases
1 626687 * -CT DEL Hom 124 0 OR4F16.1 OR4F16 mRNA NM_001005277.2 UPSTREAM: 4652 bases
now I'd like to join those results with my VCFs. But, as you can see, SnpEff change the way the alternate bases are defined. Do you know any way to join those files ?
Thanks.
EDIT:
here is my temporary C++ solution: it converts the VCF to SnpEff:
(...)
while(getline(in,line,'\n'))
{
if(line.empty()) continue;
if(line[0]=='#')
{
cout << "#Chromo\tPosition\tReference\tChange\t" << line << endl;
continue;
}
tokenizer.split(line,tokens);
string chrom=tokens[0];
if(chrom.compare(0,3,"chr")==0) chrom=chrom.substr(3);
int pos;
numeric_cast<int>(tokens[1].c_str(),&pos);
string ref=tokens[3];
string alt=tokens[4];
if(ref.size()<alt.size()) /* AC/A = DELETION */
{
assert(ref[0]==alt[0]);
ref.assign("*");
alt[0]='+';
++pos;
}
else if(ref.size()>alt.size())
{
assert(ref[0]==alt[0]);
alt.assign(ref);
alt[0]='-';
ref.assign("*");
++pos;
}
else
{
//single SNP
}
cout << chrom
<< "\t"<< pos
<< "\t" << ref
<< "\t" << alt
<< "\t" << line
<< endl;
} (...)
hi everyone. help me!
Don't put your question as an answer, it will be deleted.
Ask it as a new question