need interpretations for the vg call output
3
0
Entering edit mode
13 months ago
Maxine ▴ 50

I successfully obtained the output VCF using the following command:

vg call ${gbz_file} \
      -k ${pack_file} \
      -r ${snarls_file} \
      -s ${sample_name} \
      -t $SLURM_CPUS_PER_TASK \
      -z -a > ${output_vcf_file}

I need some interpretations for the VCF data.

  1. It appears that the number of snarls is fewer than the number of SVs used to construct the graph pangenome. The number of snarls is approximately 1.28 million, which is inferred from the record count in ${output_vcf_file}. However, the number of SVs used to construct the graph is about 1.43 million. Can you explain why this situation occurred?

  2. I would like to count the number of different types of structural variations (INS, DEL, INV in this case). However, it seems that the output vcf do not provide conclusions for each variant type. Is there a good way to address this?

  3. In the future, I will need to merge VCF files from multiple samples into a single multi-sample VCF. Can you recommend good workflows or tools for this?

Thank you for your assistance.

Maxine

vcf vg • 1.6k views
ADD COMMENT
1
Entering edit mode
13 months ago

For 1), the reason is that sometimes SVs overlap each other in the genome. In this case, they will either be merged into a single snarl, or one snarl will nest within the other one. By default, vg call only produces calls for top-level (i.e. not nested) snarls.

For 2), I'm not aware of any tool that does this, unfortunately. It's a significant gap in the bioinformatics tooling (unless there is a tool that I don't know about). You could probably get a good way on this with awk or Python scripting though.

For 3), you probably want bcftools merge.

ADD COMMENT
0
Entering edit mode
12 months ago
Maxine ▴ 50

When I read the output vcf in detail, I have something else that i want to share.

  1. Among the samples, there are some variants with slightly different start positions, although they are very close (in most cases, the start positions are adjacent, for example, POS is 1000 for sample A and 1001 for sample B). Since these samples were called using the same snarl file, why does this occur? I even checked the input VCF used to construct the graph and found no two variants that were so closely positioned. Here are the actual records:
    sample A:
    NC_058080.1_1   59145   >89127697>89127735  ATAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGTCTAGTAAAAGAGGGGGGGCATCGCATAGGAGAGGAGTTTGGGGGGCTATTTATGGGGGTGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG    ATAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG,A  8.21742 PASS    AT=>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127717>89127718>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735,>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735,>89127697>89127735;DP=7    GT:DP:AD:GL:GQ:GP:XD:MAD    0/0:7:6,1,0:-3.03612,-2.11332,-4.33715,-3.03612,-4.33715,-13.582:13:-1.1474:11.9455:6
    sample B:
    NC_058080.1_1   59146   >89127697>89127735  TAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGTCTAGTAAAAGAGGGGGGGCATCGCATAGGAGAGGAGTTTGGGGGGCTATTTATGGGGGTGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG TAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG 14.5674 PASS    AT=>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127717>89127718>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735,>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735;DP=10  GT:DP:AD:GL:GQ:GP:XD:MAD    1/1:10:1,8:-2.85943,-1.98197,-2.85943:18:-1.1128:13.2599:8
    input vcf:
    NC_058080.1_1   59145   Sniffles2.DEL.47M0  A   <DEL>   60  PASS    PRECISE;SVTYPE=DEL;SVLEN=-1155;END=60300;SUPPORT=4;COVERAGE=17,14,14,14,14;STRAND=+-;AC=4;STDEV_LEN=364.196;STDEV_POS=290.085;SUPP_VEC=000000001011 GT:GQ:DR:DV:ID  0/0:0:9:0:NULL  0/0:0:19:0:NULL 0/0:0:10:0:NULL 0/0:0:16:0:NULL 0/0:0:13:0:NULL 0/0:0:26:0:NULL 0/0:0:23:0:NULL 0/0:0:26:0:NULL 0/1:21:3:3:Sniffles2.DEL.3AD8FS0,Sniffles2.DEL.3AD8ES0,Sniffles2.DEL.3AD91S0    0/0:0:23:0:NULL 0/1:16:12:5:Sniffles2.DEL.4975DS0,Sniffles2.DEL.49761S0,Sniffles2.DEL.4975ES0,Sniffles2.DEL.4975CS0,Sniffles2.DEL.49760S0   1/1:31:1:15:Sniffles2.DEL.3E33CS0,Sniffles2.DEL.3E341S0,Sniffles2.DEL.3E33DS0
    
    From this example, it is evident that the allele traversal as a path in the graph is identical for sample A and sample B.
ADD COMMENT
0
Entering edit mode

The content is too long, I have to cut it in two.

  1. All the variants in my input VCF are biallelic. I did not perform any augmentation before calling, so I would expect that the output VCF should not contain any new POS or genotypes. However, in reality, the output VCF contains many multi-allelic variants. For instance, the variant mentioned in the previous example appears as a multi-allelic variant in sample C, whereas the input VCF indicates that it is biallelic.

    sample C:
    NC_058080.1_1   59145   >89127697>89127735  ATAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGTCTAGTAAAAGAGGGGGGGCATCGCATAGGAGAGGAGTTTGGGGGGCTATTTATGGGGGTGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG    A,ATAAATGGTAGAAAAATTGCAACAGGTTTTATTGAAAAAAAAACACTGGCTCAATTGGACATCACTATGATCATTTGCACTGGTATAAACGGTAGAAAAATTGCAACAGGTTTTTGAGAAACGGCAAAATTCCCAATTACATGTTCCCTGAATGCGGCAAAACTGTGCGCTGCTACTGATGCACTAAAATACACTTATTTGTGCATCTAAAATGTAGAAAAATTGCAACAGGTTTTAGGGAAACAGAAATGCACAATTGGACATCACCTGAGTGCATCCAAACTATACACTGTTATTGACGCACTTATAAATGGTAGAAAAATTGAAATAAGTTTTGAGAAACACCAAAATTTCAATAGGACGACCTCTGGGTGAAACAAAAATATACACTATTATGCTGCTGTTAATGCAGTAAAATATGATTATTGCACATATATAAAATTTTGAAAAATTGTAAGGCTGGCTTCTGTACAGTCTGTTGTCTTCTTAAGGATCTCTGTGCTGCAGTAGTCTGGACTAGGGCTCAGCTAAGGGGAGGCCTGCATGATTAGGGTTGGTTGTGAGGGTCAGGGAGGTTAGAGGTTAAGGGATTTGAATGGTCTTGTAGTGAGGGGTGCCATTTTGTGCACTGAGAAAAACTGGTAAGATGTCCCAGACACTTACCATCAATGATATGCTGGCTAGGCTAAGAGAGGCAGCGGCGGAGCGGGGCCATGAGTGGCTGAGCTCCCAGATGTCCGCCATATTGAGGGCTGAGGCAGCGGGTACGGTTAGCTCGCCTCCGGAAGGAAGACGGACATGGCTGGTACGGCCGCCCGCGCGCCTGAGTCCCAGTGAGACCCCCCGAGTTCGGCGCCGTGTCAGGAGCCCCTCCGGGGACCCTCCACCTCGGGGGAATGCAGGGCAGACTACATCCTCATCCTCCTGGCGTGGGAGGAATCCATGTGGCAGGCGTAACCCAGCGCAGGGCAGGAGGGCCCTCCCCTGCCCTCTCCGGTGAGCAGCGATGGATCGGGAGCGGCGGGATCCTTACCTTCTAGTGGGGTCAGGAGGCCTGCTCAGCGAGGGATGGACGGGAGGCATGTGGCACGCGAGG  22.2838 PASS    AT=>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127717>89127718>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735,>89127697>89127735,>89127697>89127698>89127699>89127700>89127701>89127702>89127703>89127704>89127705>89127706>89127707>89127708>89127709>89127710>89127711>89127712>89127713>89127714>89127715>89127716>89127719>89127720>89127721>89127722>89127723>89127724>89127725>89127726>89127727>89127728>89127729>89127730>89127731>89127732>89127733>89127734>89127735;DP=7    GT:DP:AD:GL:GQ:GP:XD:MAD    1/2:7:1,1,5:-4.29935,-3.60807,-3.08519,-10.3865,-3.60807,-4.29935:6:-1.28391:12.2142:1
    

    Why is this happening? are these multiallelic reliable?

  2. I would like to hear your opinion on whether to enable nested calling mode. Based on my understanding, using short-read data to call nested variants may increase the false positive rate (inaccurate calling). If my goal is to have the most accurate calling possible and I don't mind sacrificing the quantity of calls (some inside variants not being called), then I should not enable it.

Thanks for your attention.

Maxine

ADD REPLY
1
Entering edit mode

The reason for the multiallelic records is the same as my answer for 1) above: biallelic variants can overlap each other in the genome. When this happens vg call will treat it as one variant locus with with multiple alleles. I'm not sure why the position would shift though. Maybe glenn.hickey knows better.

ADD REPLY
0
Entering edit mode
12 months ago
glenn.hickey ▴ 520

Variants can shift around a bit in vg construct, so it's possible your graph and VCF are not identical -- though they will be equivalent. And yes, overlaps will cause multiallelic records like you are seeing: Ovelapping variants get lumped into one site, and your alleles each span that merged site.

Calling the nested variants is no more or less accurate than not nested variants. It's just that the VCF representation becomes more difficult to deal with, as you can have the same variant showing up in two different records. There are tools like vcfbub to deal with this, but you really need to make sure that you are considering this quirk representation when using the VCF.

For some ideas on how to normalize nested VCFs, look here: https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/mc-pangenomes/hprc-v1.1-mc.md#vcf-postprocessing for an example of using vcfbub and vcfwave.

Finally, if you are building your graph from VCF, you can run vg call on exactly that VCF with -v. You just need to make sure that your graph with build with construct -a to do this. This way your calls will be written exactly in terms of your input variants.

ADD COMMENT
0
Entering edit mode

I failed with the command "vg call -v." All the variants from -v VCF file failed to be called. I suspect it might be because I constructed VG for each chromosome following the tutorial: Working with a Whole Genome Variation Graph

Anyway, I'll provide the command and some of the VG call warning messages below, hoping that this information is sufficiently detailed:

construction for each chromsome:
    vg construct -f -S -a \
    -t ${SLURM_CPUS_PER_TASK} \
    -R ${chrom} \
    -r ${ref_genome} \
    -v ${input_vcf} \
    > ${chrom}.${vg_pre}.vg

Note ID coordination:
vg ids -j $(cat ${chrom_list} | while read chrom; do echo "${chrom}.${vg_pre}.vg"; done)

index for xg:
vg index -t ${SLURM_CPUS_PER_TASK} -p \
        -x ${vg_pre}.16in1.xg \
        $(cat ${chrom_list} | while read chrom; do echo "${chrom}.${vg_pre}.vg"; done)

pruned vg then index for gcsa:
vg index -t ${SLURM_CPUS_PER_TASK} -p \
    --temp-dir ${temp_path} \
    -g ${vg_pre}.16in1.gcsa \
    $(cat ${chrom_list} | while read chrom; do echo "pruned.${chrom}.${vg_pre}.vg"; done) 

vg call:
  vg call ${xg_file} \
    -k ${pack_file} \
    -r ${snarls_file} \
    -t ${SLURM_CPUS_PER_TASK} \
    -v ${input_vcf} \
    > ${output_vcf}
----------------------------------------------
most variants calling failed like this:
[VCFTraversalFinder] Warning: No alt path (prefix=_alt_e025a9b83d893c574eeab3a2991ab82e907c6e60_) found in graph for variant.  It will be ignored:
NC_008410.1 13489   Sniffles2.INS.9M2EF T   CATTTGGATTTGAAAGCAGCCGCATGATACTGACCATTTTGTCGACGTTGTTTGGACTCTTCTTATATGTCTCAAATTTTACTGATGAGGCTCTTTTTTCTTAGTATAATTAGTACTAATGACTTCCAAATCATAAGAATTTTCTTGGTTAAACCCCGAGAGAAAATAATGACCTATATTGTTCTTCTTACCTTAATATTGTTCAGTTTTAGCCTCATCAGCTTCTGACTACCAACTATTAGCTCGGCTCAGAAAAACTCTCACCTTATGAGTGCGGCTTTGACCCGCTAGGATCCGCCCGCCTCCCATATTCCATGCGGTTCTTTCTAGTGCTATTCTTTTTCTCCTCTTTGACCTAGAAAATTGCCCTCCTCTCCCCACCCCCTGAGCTGCACAACTTCCCTATCCCACCCGGTCTATCTTCTTGCTCAGTAATCCTAATTCTTTTAACCTTAGGGTTTGTCTATGAGTGACTTCAAGGAGGCCTGAGAATGAGCTGAATAAGGAGTTAGTTCTTAAAAAAGACAGCTGATTTCGACTCAGCAAAATTATGGTTTAACCCCATAGCGCCTTTTGATAACAAACGAATCCTGATTACTCTCTTCCACATTTATATTGAGCCTATTGGCCTGTCATTTCACCGGGCGCCCTGCTCTCAGCCCTGCTCTGCCTAGGAGGGCATGATACTCTCAAAATCTTCATTGGCCTGGCCTCTGGGCCTAAAAATTTGCTTAATCGCCCCCCTAATGTACCTATTGCCATGTAAAACTATGTCTGCATGTGAGGCGGGCTCGGCCTCTCCCTCATTAATCGCCACTGCTCGAACCCACGGCTCAGATAATTTGAATACCTTAAACCTCCTGCAATGCTAATAATATTATCCCTCTTGCCACCCTCCCTCCTCTCAACCTGACTGGCTCCCTTCAAAGCGGTTGTGAGAAATTATCACAACCCAGACCTTAGTTTTGCTGTCATGTCAACAACCTGGTTAATAACTCAAGAAACTAGCCCATTTTTAAACAACAAAACTATTTTACTATTGACGAGATCTATACCTTCCTTTTAGTGTTAACTTGCTGACTTACAGCACCCACAACTCCTAGCTAGCCAAGTAAACTGTCCAGGAGCCAATTTCACGACAACGAGCCTATATTTTTACCATTATTATACTTCAGTCACAACTCTACTAGCATTTCTGGCAGCCAATATAATTTATTTTTTATTATATTTGAAGCCACAATAATCCCAACCCCCCCTAATTGTAATTACTCGTATGAGGGGCCCAAAAAGAACGCATGTTAGCCGGAAAAACCTACCTAATTTTTTACACCTTATTTGGTCCTGGCCTTCTCACAGCCCTCTGGTACTTTCACCGAGACCTTTGGAACTTGCTCAAGCCACTAGCAAAATTCTTTCCCGGCAAATACCACCCTGTCAACCTGCTCGTTAAACCTGGGTGGCTTGCTTGTCTTATTGCTTCTTGGTCAAAATACCTCTATATGGCGTCCACCTTTGACTACCGAAGGCACATGTCGAATCCCCCATCGCGGGTCTATATTCTTGGCAGGAACCCTCCTCTAACTTGGGGGGCTATGGGATCTTGCGAAATAAAACGCTTTCATTACTGACTCCTTTACACCCCTGGCTCAACCACTTATCGTTTCCTCTCTATTTGGCGTGGCTCCTTTCAGCAATTCTATGTTCCCGTCAAACGGACTCATAAATCCCTAATTGTTCGCATTCTCATCCGTAAGTCACATAGGACTAGATAGAGGATGCGCACGCAAGAGGATTAACTCTACAGAATGAAGCATCACCGGGTCAGTGATAGTGTTATTAATTTCTCACGCCTTGTTCCTCAGGCCATTTGTGCTTGCTAAAACCTCATTCATGAACGAACATCTCCCGCTAAACTTAATTTTACTTCAAAGGACCTCAAATTTATTTTCCCACTGGCTGCAGCGTGATGACTTCTCGCAGCACTCTTATGAATTGGCCCTCATCACCCTCCCCAAAACTTCATTGGGGAAATATCAATTTTAACATCTTATTTCAGTGATCCACATAACACTTACTCTAACCGGGTTAGGCATCATCTTTACAAACTGCCTACTCTTTGTACATATTTTGGGCCTCACAACGAGAGCACCTTCCATCCACTTACACCTACTCCACCCCCACACCCCGGACCCCCTCCTCCTTTCATCCAAACTCCCCCTCCCTCCTTCTACACTGCTATAACCCCCCAACAAACCCTCCTACCTCCACTTATTTAACAACACGCCCCTCCGCCCCCTCCCCCACTCCCACCCTCCTACCCCGTCTCCTCCCCCCCCACTCTCAAAATCCCCCCCCTACCCCTGTCTTGTGATTTACCCCCGAGAAGAGACCCGTCTACATGACTCTCTCCGTTTCACCGAGGCACCTCGACCCGCGCAATGCACTCGAGAAACTGCTAATTACTCGCCACTGAAGTTCAACTCCTCAGCAGGCTCCGCCCCCCCGCCAGAACCTTCCAAAATACATGTACCCCTTTGAGTACCCGCCATTGAGCTTCCCCCCACAAATCTGCCACTAACATACCCCCCCCCATAAATCCTAACTCAGCCCATTACTATCCTCCTCTACTCCTGCCTCCTCCCCCTTCCCTAAACCTCCAAACTCAACAGACCCTTCCCCACCTCAAAAACAAAAACCCGCTGCCAATTACAAACAGGCTTTTCTCATCTCATTAATCCTGCTATTTAATTATTAATGAAAACTCCCAACCCCACCAACAATTTCATGAAAATGATTTAATATTCTGATTACCCCATCAATCTGACCGTTCAGCTTGATCAATACTCTATTCTTTTTATCCAATCGCCCTAAATAGTCTCGTGATGCATTATTGAGTATTCATTATGATACATACACAATGACAACAAATCCAACTCTTTTTCAAATATCTTCATTATCATCTTAACTGGCAATGATACTTTAGTGTCAGCTGGAAAACCTCCTAATGCTATTTATTGGGTGGGAGGGTGGTCGGGATTATTATCATAGCCTCCTCATCGGTGGTACTTCACACGAAAGCAATGCTGGTGCAGCGCGCTTCAAGCGTTCCGTTTTAACCGGAGTGGGAGATATTGGATTTCTGTTCGCCATATTCTGACTAATTTCCTCCTAAGACTCTATTGCCCTAAACTTTATCTTTTTCAATGGAAAGGCCCCACCCCCCTTCTACTAGCCTCATTATTGCAGCCGCCAGCAATTCAGCTCAATTTGGCCTCCCAGCCTTGACTTGGCTTCCGCGATAGAAGGCCGCCACCCCGTATCCGCCCTACTACAACTCTAGCCATAGTTGTTGCGGAGTATTTCCTTCTTCATCGCATCCACCCTTTAATTAAAGATAACCAAACTGCCCTAAACCACCTGCCTATGCTTAGGGGCATTCACTGTGTTTGCTGTACATGTGCTTCTAACCCAAAAATGACGTACAAAAAATTTATGCCTTTTCAACATCAAGTCAGCTTGGTCTGATAGTGGTAGCAATTGGCCTAAACATGCCCCACCTAGCATTTCTTTCATATCTTCACCCACGCTTTTTCAAGGCAATGCTTTTCTTATGGCTCAGGGTCTATCATTCATAGCCTCATCGACGAACTAGGCTATTCGATAAATAGGGGGCTTACAAAAAACTGCCATTTTCAACACAAGTGTGACAATCGGCAGCCTAAGCCCTCATGGGAACCCCCTACTCGCGGGCTTTTTCTCTAAAGACGCATATTATGAGGCAATCAACACCGCCAAACGTAAATGCATGGGCCTACATTAATATCATTGCCACCTCATTCACAGCTGTCTTACCTACGGGTAATTTTCTTTTGCTATCTTTAGACCACCCTCGATTTCCTCCCGCCTCCTCTATTAACGAAAATAACCCACTATATCAGAACCAATTAAACGTCTTGCCGTTGGCAGCATTATTGTGGCCTTTTATTAAATCAGATAATTCCCATCCTCACCAATAAACTTTATAAACAATACCAACCTATTCTTAAAGTTACTGCGATCGCAGTGACTTCCTAGGCTTCCTCATTGCCCTAGACTTAGCTAACGTTTCCTGAACTAAATCGACCGAACAAACAAACCACTCCAAAACAATCAACACCTCTTTTTACCCAACCACCATTCATCGGTCCTCCCACTTATACCATAGACATATGCCTGCGCTTCTCATCTCAATTAATCGGACACCCTCAGGCTAGAAAAAAATTTGGACCAAAAGGCTTAGCAGAACTCCAACTGCCCCATTATAAAAAATCAAGAAATCCAACGAGACAAATTAGAAAACTTACACTCACATCTTTATCTTCACCGCTATTTTATGCTTAATTGTCTTCAGCTCGCTCAAGACCCCACCTTAATGGCCCCGAACTACCTCATATACCGCAAACAGTGTTAATAGTAAAGCTCACAACTGCTATTAATACTCCCCCGCCTCAAAAAAATATTATTGCCACCCGTACCAATCCCCCGCATAAAAACCCCAAACAGCATCCACCACCTCCACTGATTACTGACACCCCATATACTAGAGATGAGAGATATCACCATACCCCAAGCAGCTCATAGATTGTACACAACAATGTCGACCAAAACAAACCCCTCCCCCCCAGGCCTCAGGGTATGGCTCAGCCACAAAGCCGCAGAGTAGGCAAAAACTACTATTTTCCACCTAAGTAAACCAAAAAACAAAACTAAACATAAAAAAGAAGCCCCCCCATCTATCCAAATCTAAAACACCCCGCCCCAGCGGAGCCCACCAACCCCAGAGCAGCAGATAAGGAGAATGGTTTGAAGCCACCGCCAACAGGCCAACAATCAACCCAAGCTCGAATAATACAGTCATAATTCTACAAGGACTTTAAACCTAGACACAGTCCTGAAAAATGTTGTTGTATCAACTATAAGAACTCTAATGGCCCCGTACTTCGCAAAAAACCCAGCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTCTAGTAAAAATTATTAATAAACTCATTTATTGACCTCCCTGCACCCACTAACTTTCGTCATTGTGGAAACTTTGGGGTCTTTCTGGGGTCTGCCTAATTGCCCAAATCGTTACTGGGTTATTCCTAGCGGATACACTTACAACGCTGACACATCTATAGATTCTCATCCGTAGCCCTATTTGCCGAGATGTAAAACAAACGGCTGGCTGCTTCGTAATCTTTCATGCAAACGGCGCCTCATTTTTTCTTCATCTGCATCTACCTCCACATCGGAGAGTGTTTGTACTATGGTTCCTTTCTTATTTAAAGAAACTGAAATATTGGTGTCATTCTCCTATTCCTGGTTCTATGGCTGCAGCATTCGTAGGCTACGTCCTCCCATGAGGACAAATGTCCTTCTGAGGAGCAAACGTAATTACAAACCTTCTCTCCGCTGGCCCCCTATATTGGAAACTTCCGGAAACTGTTCAATGATCTGGGGCGGGTGTTTCAGTAGGACACGCTACTCTGACACGATTTTTCACATTTACTTCATCCTGCCGTTTTCATTGCAGGCGCCTCCATACTTCACCTTCTATTCTTACATCCAAACAGGATCTTCCAATCAACAGGTCTTCACCCCAACTTCGACAGATCCCCTTCCTCGCCTGTTACTCTATAAAGATCTCTTCGGTTCGCAATCATACTTGCCCCTACTTGCCCTACTATCCACTTTTGCCCCCAACCTCCTGGGTGACCCAGACAACTTTACACCAGCCAACCGCTGGTCACCCCCCGCACATCAAGCCAGAGTGATACTTCTTGTCGCTTACGCCATTTCTTCGCTCGATCCCAAATTAAATGGGGGGTCTTGGGCTCTTCTTATTTCTCTATCATAATCCTCTTCCTTCATGCCCTTCCCCATACCTCCAAACAGCGACCCTTATGTTCCGGCCTGGCAAAGCTCTTCCTTTTGAACACTAGTAGGCCAACACCCTAATCCTGACCTGAATCGGAGGTCAGCCAGTAGAAGACCCCTTGTGATAATTGGTCAACTCGCCTCTATTCTCCTACTTGCTTAAATCTTTGTCATCTTGTATCCCCTTACTCGGGACTCACAGAGAACAAATACCCAACTCAGCAGCGACCCACTGGAAGTTACTGTCCAGATGGCACAATTGCTCAACAGACGACCCACTGGAAGTTACTGTCCAAAGACAATTGCTTGCAGTCGACCCACTGGAAGTTACTGTCCAAATGACAATTGCCTTAGTAACTAAATTTCACTCCAATAACCTGCTTATTTAACATGTCATTTAATTCAGTTGCTATACTTCAACAATATTAACCGTCACTAACATGTCTATTTCAATATTATGGTGGCAGTATATGTTCCTGCATTGCGCGGTGTACATATTACTGTATGTATAATAAGACATACTATGTTACTCGCGCATGCAGGACTTTACTGCCCACCATAATATGAAATAGACATGTTAGTGACGGTTTAATATTGTTGAAGTATAGCAGCAACTG 60  PASS    PRECISE;SVTYPE=INS;SVLEN=6543;SUPPORT=47;COVERAGE=298,286,287,282,283;STRAND=+;AC=2;STDEV_LEN=0;STDEV_POS=0;SUPP_VEC=000010000000
----------------------------------------------------------------
DELETION variants with symblic ALT failed like this (this kind of deletions are fine when constructing graph):
[VCFTraversalFinder] Warning: Unable to canonicalize symbolic variant because no reference fasta was given:
NC_058090.1 100355765   Sniffles2.DEL.103C2ME   G   <DEL>   57  PASS    PRECISE;SVTYPE=DEL;SVLEN=-61;END=100355826;SUPPORT=2;COVERAGE=12,12,12,12,13;STRAND=+-;AC=3;STDEV_LEN=64.703;STDEV_POS=377.885;SUPP_VEC=111101110100
----------------------------------------------------------------
same error occured on symblic INVERSION as well:
[VCFTraversalFinder] Warning: Unable to canonicalize symbolic variant because no reference fasta was given:
NC_058080.1_1   1327675 Sniffles2.INV.3EFM0 C   <INV>   60  PASS    PRECISE;SVTYPE=INV;SVLEN=243;END=1327918;SUPPORT=2;COVERAGE=8,4,1,1,6;STRAND=+-;AC=7;STDEV_LEN=284.992;STDEV_POS=8.122;SUPP_VEC=101011111111
ADD REPLY

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6