Combining VG graphs
2
0
Entering edit mode
8 months ago
AshleeThomson ▴ 110

Hi everyone,

I have a query about combining graphs made using VG.

I have 10 samples. For each sample I have built graphs using VG (.pg), one for each chromosome, and embedded their variants. Firstly, I want to combine each chromosome graph for each sample so that I only have 1 graph per sample. Then, I want to combine the 10 graphs to make 1 big graph that contains all the variants from each of the 10 samples.

I was just wondering the best way to go about this. I know vg has a combine` function, but I wasn't sure if that is what is needed, or can I 'cat' the chromosome graphs together first and then use 'vg combine' for the final construction.

Thanks in advance.

vg • 914 views
ADD COMMENT
1
Entering edit mode
7 months ago
Mayank ▴ 10

Greetings,

I do not know how you can do this with vg, but I have done it using MoMI-G (https://github.com/MoMI-G/MoMI-G/). This is also based on vg but makes it much easier to graph multiple reads together.

Hope this helps.

ADD COMMENT
1
Entering edit mode
7 months ago
LauferVA 4.5k

Hi Ashlee -

I made a 'lil bash script that conceptually has two sections, a file IO and chromosome management section (section 1), and a core vg workflow section (section 2). In section 1, I assume that the starting data might be named something like sample1_chrom1.pg and that once combined it'll be something like sample1_combined.pg.

My implementation is a bit clunky because the user has to specify which samples are female instead of determining this on the fly. But since I think the core part you care about is the vg commands I did not optimize that.

How do those look to you?

#!/bin/bash

#### Section 1: File IO and nailing chromosome specifications in bash

# Define list of chrs
chromosomes=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y) # Change depending on organism

# Ensure unique IDs for each chromosome graph within each sample
function ensure_unique_ids_within_sample() {
  local sample=$1
  for chrom in "${chromosomes[@]}"; do
    vg ids -j -m sample${sample}_chrom${chrom}.pg
  done
}

# Ensure unique IDs for each chromosome graph within each sample
for sample in {1..10}; do
  ensure_unique_ids_within_sample $sample # could add error checking prior to next line 
done

# Could consider a function like this to combine chromosome graphs for each sample instead of below code block
#    function combine_chromosome_graphs() {
#     local sample=$1
#      vg combine -o sample${sample}_combined.pg \
#        sample${sample}_chrom{1..22}.pg sample${sample}_chromX.pg sample${sample}_chromY.pg
#    }

# Combine chromosome graphs for each sample. Note this implementation asks the user to specify which samples are female.
for sample in {1..10}; do
  if [ "$sample" -eq 5 ]; then  # Example condition where sample 5 has two X chromosomes instead of X and Y
    vg combine -o sample${sample}_combined.pg \
      sample${sample}_chrom{1..22}.pg sample${sample}_chromX.pg sample${sample}_chromX.pg
  else
    vg combine -o sample${sample}_combined.pg \
      sample${sample}_chrom{1..22}.pg sample${sample}_chromX.pg sample${sample}_chromY.pg
  fi
done

#### Section 2: Core vg steps, here:

# Ensure unique IDs across all combined sample graphs
vg ids -j -m sample1_combined.pg sample2_combined.pg sample3_combined.pg \
  sample4_combined.pg sample5_combined.pg sample6_combined.pg \
  sample7_combined.pg sample8_combined.pg sample9_combined.pg \
  sample10_combined.pg

# Combine all sample graphs into one final graph
vg combine -o final_combined.pg \
  sample1_combined.pg sample2_combined.pg sample3_combined.pg \
  sample4_combined.pg sample5_combined.pg sample6_combined.pg \
  sample7_combined.pg sample8_combined.pg sample9_combined.pg \
  sample10_combined.pg

# Validate the combined graph
vg stats -a final_combined.pg

By way of caveats:

  1. I used generative AI to help me generate the bash syntax quickly and accurately as I do not spend a lot of time with bash. I think your question is mainly about section 2, but others may need help with section 1 in the future.
  2. vg combine is supposed to keep all your annotations. But vg combine may or may not support .pgs. If it does not, then I might do a little test run first to see if it is appropriately keeping all the annos. In this case, I suppose the workarounds would be to either convert to back to vg, combine, then re-generate the packed graph, or use another solution like in the intriguing answer provided by @2951abec3 (though I do not know if this retains annos either).
  3. You may need significant computational resources for this - but you seem on top of that :-).

If vg does not, in fact, support pgs, it might be worth opening an issue on the git page to see how the creators would handle the combine and annotate problems efficiently. What's your ultimate application (Im just nosy)? :-)

Best, Vincent

ADD COMMENT
1
Entering edit mode

Oh wow, thank you! I'll go through this and see how it goes. VG does support PGs so there shouldn't be an issue. My aim is to compare alignment accuracy when mapping sequencing reads to different graphs that have different degrees of variation in them. So the first graph will only include "default" variation, while the final graph will contain the "default" variation, plus the variation from 10 individuals. I thought it would be faster to create individual graphs and combine them, rather then align 1 sample at a time to the graph. The hypothesis is that more variation in the graph equals better alignment.

ADD REPLY
0
Entering edit mode

That makes a lot of sense to me.

VG supports PGs, but even when you do vg-combine? If so that is awesome. I was future-worried, so to speak, about that. So thats good news :-)

ADD REPLY
0
Entering edit mode

AshleeThomson how did it work for you?

VAL

ADD REPLY

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6