To whom it may concern,
I try to convert dbSNP data into predicted Proteinvariants in fast format (in this case the human gene fus (ENSG00000089280).
In my example, I retrieved the rs numbers from the ncbi dbSNP and exported the data
(inquire: (FUS[Gene Name]) AND pathogenic[Clinical Significance]
)
to .bed and/or vcf file.
My aim is to generate (prediced) protein sequences (e.g the refSeq sequence, but with the according mutation) from human indel or single point mutations. I try to get my hand on CustomProDB, but my R skills are limited so far.
Example .bed file:
track name=dbSNP_human description="dbSNP Build 142 ()" date="2015-04-19 10:00" taxId=9606 dbSnpBuild=142 URL="http://www.ncbi.nlm.nih.gov/snp" assembly= assemblyAccession=
chr16 31191407 31191408 rs121909667 0 +
chr16 31191417 31191418 rs121909668 0 +
chr16 31191409 31191410 rs121909669 0 +
chr16 31191418 31191419 rs121909671 0 +
chr16 31190397 31190398 rs186547381 0 +
chr16 31191088 31191089 rs267606831 0 +
chr16 31185060 31185061 rs267606832 0 +
chr16 31191426 31191427 rs267606833 0 +
chr16 31191051 31191052 rs387906627 0 +
chr16 31185030 31185031 rs387906628 0 +
chr16 31189157 31189158 rs387907274 0 +
I would be very thankful if someone could help me to generate protein sequences from DNA .bed files (or similar) in the future.
Please shout if I forget important things to mention, or If my question needs to moved to another forum path.
Many thanks,
Julian
Hi, thanks for the answer.
Thanks, The problem is the automated amino change to create the sequences in a batch. I am unfortunately not experienced in scripting, and therefore don't know how to start.
thanks, Julian