I have block-substitutions which I'd like to add to a 1000g-like file.
When looking at YRI trio files (2010/07 indels), I do not find any sub, only ins and del calls.
Is this because subs are not supported or because they have been filtered out?
the VCF format, when coming to structural variation, admits the following: INS, DEL, DUP, INV, and CNV. you can check this at the VCF v4.0 specification.
the reason why only indels and calls are found on the pilot data analysis may well be the fact that those other analysis were simply not performed. small-indel and SNP calling is almost straight-forward to do, so I guess that those were the only analysis carried out on the pilot data. I am not sure how deep they will end up analyzing final data, but I understood when the project was launch that they were going to describe "all possible variation found by sequencing".
There is no reason that the same format for indels couldn't be used for strait substitutions. For indels the position should be the last bp before the start of the indel and the ref and alt alleles strings should contain the last base before the change
e.g
1 1000 . ATGCG A
would be a 4 base deletion
If you wanted to describe a substitution could do
1 1000 . ATGCG AGCTA
would be a 4 bp substitution
The 1000genomes aims to find at least 95% of all variants down to a MAF of 1% (We will also find some variants at lower frequency but a smaller percentage of them). Either small or large substitutions haven't been predicted yet but that doesn't me we won't do it just that we are working on other problems at the moment
ADD COMMENT
• link
updated 5.7 years ago by
Ram
45k
•
written 14.2 years ago by
Laura
★
1.8k