Incorrect gvcfServerPath for reference genome
1
0
Entering edit mode
21 months ago
wcs98 • 0

Hello

I'm using the latest PHG and I'm on step 3: imputing haplotypes. I am having a problem with an incorrect gvcfServerPath I had put in my initial genomeData file. It has stopped me at the HaplotypeGraphBuilderPlugin.

[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin - genome path variable must be a semi-colon separated string, with the first portion indicating the server address, e.g. server;/path/to/file. Error on genomePath

Is there anyway to correct this path in the database, or do I have to rerun the entire pipeline from the beginning?

PHG • 787 views
ADD COMMENT
1
Entering edit mode
21 months ago
lcj34 ▴ 420

Yes, you can correct it in the database. Enter your database and find the ids you'd like to change with a query something like:

select id, genome_path, genome_file from genome_file_data where type=2;

That query will show you the paths for the gvcf files (type "2" are entries for just the gvcf files)

You can then change the ids you want changed with the command:

update genome_file_data set genome_path='host;/my/gvcf/path' where id=20;

Example from a test db I have:

phgsmallseq=# select id,genome_path, genome_file from genome_file_data where type=2;
 id |               genome_path               |                   genome_file                    
----+-----------------------------------------+--------------------------------------------------
  2 | localhost;/Users/lcj34/temp/gvcfRemote/ | Ref_Assembly.gvcf.gz
  4 | localHost;/remoteGvcfs/                 | LineA.gvcf.gz
  6 | localHost;/remoteGvcfs/                 | LineB.gvcf.gz
  7 | localhost;/Users/lcj34/temp/gvcfRemote/ | Ref_haplotype_caller_output_filtered.g.vcf.gz
  8 | localhost;/Users/lcj34/temp/gvcfRemote/ | RefA1_haplotype_caller_output_filtered.g.vcf.gz
  9 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineB1_haplotype_caller_output_filtered.g.vcf.gz
 10 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineA_haplotype_caller_output_filtered.g.vcf.gz
 11 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineB_haplotype_caller_output_filtered.g.vcf.gz
 12 | localhost;/Users/lcj34/temp/gvcfRemote/ | LineA1_haplotype_caller_output_filtered.g.vcf.gz
(9 rows)

phgsmallseq=#

phgsmallseq=# update genome_file_data set genome_path='132.236.88.4;/Users/lcj34/temp/gvcfRemote/' where id=2;
UPDATE 1
phgsmallseq=#

Then check by listing them again:

phgsmallseq=# select id,genome_path, genome_file from genome_file_data where type=2 order by id;
 id |                genome_path                 |                   genome_file                    
----+--------------------------------------------+--------------------------------------------------
  2 | 132.236.88.4;/Users/lcj34/temp/gvcfRemote/ | Ref_Assembly.gvcf.gz
  4 | localHost;/remoteGvcfs/                    | LineA.gvcf.gz
  6 | localHost;/remoteGvcfs/                    | LineB.gvcf.gz
  7 | localhost;/Users/lcj34/temp/gvcfRemote/    | Ref_haplotype_caller_output_filtered.g.vcf.gz
  8 | localhost;/Users/lcj34/temp/gvcfRemote/    | RefA1_haplotype_caller_output_filtered.g.vcf.gz
  9 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineB1_haplotype_caller_output_filtered.g.vcf.gz
 10 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineA_haplotype_caller_output_filtered.g.vcf.gz
 11 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineB_haplotype_caller_output_filtered.g.vcf.gz
 12 | localhost;/Users/lcj34/temp/gvcfRemote/    | LineA1_haplotype_caller_output_filtered.g.vcf.gz
(9 rows)

phgsmallseq=#
ADD COMMENT
0
Entering edit mode

Thank you very much, this worked perfectly. I was also able to fix incorrect paths I had for the other gvcfs.

ADD REPLY

Login before adding your answer.

Traffic: 1939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6