Entering edit mode
5.8 years ago
MatthewP
★
1.4k
Hello, everyone! I want to use RNAEditor . This need to prepare many database first, one of them is VCF
for dbSNP
. This is the command given by document.
wget -qO- ftp://ftp.ensembl.org/pub/release-83/variation/vcf/homo_sapiens/Homo_sapiens.vcf.gz |gunzip -c |awk 'BEGIN{FS="\t";OFS="\t"};match($5,/\./){gsub(/\./,"N",$5)};$5 == "" && $1 !~ /^#/ {gsub("","N",$5)};$3 ~ /rs193922900/ {$5="TN"};$3 ~ /rs59736472/ {$5="AN"};$5 ~ /H/ {gsub(/H/,"N",$5)};{print $0}' dbSNP.vcf
My question is why need to set ALT
to TN
and AN
for rs193922900
and rs59736472
separately? Why this two sites seem to be special for RNAEditor
?
Thanks!
I have no idea what
TN
andAN
is meaning. But I have a guess why this SNPs are treated seperatly. They describe short tandem repeats (see rs193922900, rs59736472). The description in of the variant is not vcf conform (see the values on the dbSNP site in theRefSNP Alleles
columns).According to the help site these type of variants is excluded in the current vcf version of dbSNP. But it might be, that in this old version RNEditor linked to, STRs are included and lead to any problems.
Thanks finswimmer! I actually download ther newest version(release-95) of dbSNP file. I go check the
vcf
file and find this two sites still invcf
file.This may explain why set to
TN
andAN
because set all.
toN
.