Question

Incorrect gvcfServerPath for reference genome

0

Entering edit mode

21 months ago

wcs98 • 0

Hello

I'm using the latest PHG and I'm on step 3: imputing haplotypes. I am having a problem with an incorrect gvcfServerPath I had put in my initial genomeData file. It has stopped me at the HaplotypeGraphBuilderPlugin.

[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin - genome path variable must be a semi-colon separated string, with the first portion indicating the server address, e.g. server;/path/to/file. Error on genomePath

Is there anyway to correct this path in the database, or do I have to rerun the entire pipeline from the beginning?

PHG • 786 views

ADD COMMENT • link 21 months ago by wcs98 • 0

GenoMax · Answer 1 · 2023-02-17

Yes, you can correct it in the database. Enter your database and find the ids you'd like to change with a query something like:

select id, genome_path, genome_file from genome_file_data where type=2;

That query will show you the paths for the gvcf files (type "2" are entries for just the gvcf files)

You can then change the ids you want changed with the command:

update genome_file_data set genome_path='host;/my/gvcf/path' where id=20;

Example from a test db I have:

phgsmallseq=# select id,genome_path, genome_file from genome_file_data where type=2;
 id |               genome_path               |                   genome_file                    
----+-----------------------------------------+--------------------------------------------------
  2 | localhost;/Users/lcj34/temp/gvcfRemote/ | Ref_Assembly.gvcf.gz
  4 | localHost;/remoteGvcfs/                 | LineA.gvcf.gz
  6 | localHost;/remoteGvcfs/                 | LineB.gvcf.gz
  7 | localhost;/Users/lcj34/temp/gvcfRemote/ | Ref_haplotype_caller_output_filtered.g.vcf.gz
  8 | localhost;/Users/lcj34/temp/gvcfRemote/ | RefA1_haplotype_caller_output_filtered.g.vcf.gz
  9 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineB1_haplotype_caller_output_filtered.g.vcf.gz
 10 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineA_haplotype_caller_output_filtered.g.vcf.gz
 11 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineB_haplotype_caller_output_filtered.g.vcf.gz
 12 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineA1_haplotype_caller_output_filtered.g.vcf.gz
(9 rows)

phgsmallseq=#

phgsmallseq=# update genome_file_data set genome_path='132.236.88.4;/Users/lcj34/temp/gvcfRemote/' where id=2;
UPDATE 1
phgsmallseq=#

Then check by listing them again:

phgsmallseq=# select id,genome_path, genome_file from genome_file_data where type=2 order by id;
 id |                genome_path                 |                   genome_file                    
----+--------------------------------------------+--------------------------------------------------
  2 | 132.236.88.4;/Users/lcj34/temp/gvcfRemote/ | Ref_Assembly.gvcf.gz
  4 | localHost;/remoteGvcfs/                    | LineA.gvcf.gz
  6 | localHost;/remoteGvcfs/                    | LineB.gvcf.gz
  7 | localhost;/Users/lcj34/temp/gvcfRemote/    | Ref_haplotype_caller_output_filtered.g.vcf.gz
  8 | localhost;/Users/lcj34/temp/gvcfRemote/    | RefA1_haplotype_caller_output_filtered.g.vcf.gz
  9 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineB1_haplotype_caller_output_filtered.g.vcf.gz
 10 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineA_haplotype_caller_output_filtered.g.vcf.gz
 11 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineB_haplotype_caller_output_filtered.g.vcf.gz
 12 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineA1_haplotype_caller_output_filtered.g.vcf.gz
(9 rows)

phgsmallseq=#