Hello,
I am trying to extract SNPs for the parental assemblies from my DB.
Here is the docker command I am using:
sudo docker run --name hapPaths_assemblies --rm \
-v /home/kathrynmichel/KaepplerDeLeon-T330-Mount/Assemblies/PHG/Nov2020_B73v5/:/phg/ \
-t maizegenetics/phg:latest \
/tassel-5-standalone/run_pipeline.pl -debug -Xmx100G \
-GetHapIdsForTaxonPlugin -configFile /phg/config.txt \
-taxaList /phg/assemblyTaxa.txt \
-outputDir /phg/hapPaths/ -methods CONSENSUS_mxDiv0.0001 -endPlugin >logs/getHapPaths_mummer4_7.log
The assemblyTaxa.txt file contains two extra progeny lines for the sake of testing.
When I run the method as either mummer4 or CONSENSUS_mxDiv0.0001, I get this output:
/tassel-5-standalone/lib/kotlinx-coroutines-core-1.3.0.jar:/tassel-5-standalone/lib/scala-library-2.10.1.jar:/tassel-5-standalone/lib/jfreesvg-3.2.jar:/tassel-5-standalone/lib/trove-3.0.3.jar:/tassel-5-standalone/lib/slf4j-api-1.7.10.jar:/tassel-5-standalone/lib/ejml-0.23.jar:/tassel-5-standalone/lib/postgresql-9.4-1201.jdbc41.jar:/tassel-5-standalone/lib/kotlin-stdlib-1.3.50.jar:/tassel-5-standalone/lib/biojava-core-4.0.0.jar:/tassel-5-standalone/lib/sqlite-jdbc-3.8.5-pre1.jar:/tassel-5-standalone/lib/ahocorasick-0.2.4.jar:/tassel-5-standalone/lib/mail-1.4.jar:/tassel-5-standalone/lib/junit-4.10.jar:/tassel-5-standalone/lib/biojava-alignment-4.0.0.jar:/tassel-5-standalone/lib/fastutil-8.2.2.jar:/tassel-5-standalone/lib/json-simple-1.1.1.jar:/tassel-5-standalone/lib/log4j-1.2.13.jar:/tassel-5-standalone/lib/commons-codec-1.10.jar:/tassel-5-standalone/lib/biojava-phylo-4.0.0.jar:/tassel-5-standalone/lib/gs-core-1.3.jar:/tassel-5-standalone/lib/itextpdf-5.1.0.jar:/tassel-5-standalone/lib/jhdf5-14.12.5.jar:/tassel-5-standalone/lib/phg.jar:/tassel-5-standalone/lib/commons-math3-3.4.1.jar:/tassel-5-standalone/lib/guava-22.0.jar:/tassel-5-standalone/lib/jcommon-1.0.23.jar:/tassel-5-standalone/lib/htsjdk-2.19.0.jar:/tassel-5-standalone/lib/forester-1.038.jar:/tassel-5-standalone/lib/colt-1.2.0.jar:/tassel-5-standalone/lib/javax.json-1.0.4.jar:/tassel-5-standalone/lib/jfreechart-1.0.19.jar:/tassel-5-standalone/lib/slf4j-simple-1.7.10.jar:/tassel-5-standalone/lib/gs-ui-1.3.jar:/tassel-5-standalone/lib/snappy-java-1.1.1.6.jar:/tassel-5-standalone/sTASSEL.jar
Memory Settings: -Xms512m -Xmx100G
Tassel Pipeline Arguments: -debug -GetHapIdsForTaxonPlugin -configFile /phg/config.txt -taxaList /phg/assemblyTaxa.txt -outputDir /phg/hapPaths/ -methods CONSENSUS_mxDiv0.0001 -endPlugin
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.64 Date: July 9, 2020
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 91022 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_212
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 24
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GetHapIdsForTaxonPlugin, -configFile, /phg/config.txt, -taxaList, /phg/assemblyTaxa.txt, -outputDir, /phg/hapPaths/, -methods, CONSENSUS_mxDiv0.0001, -endPlugin, -runfork1]
net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:12
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
GetHapIdsForTaxonPlugin Parameters
configFile: /phg/config.txt
taxaList: [B73_Assembly,B84_assembly,LH145_assembly,NKH8431_assembly,PHB47_assembly,PHJ40_assembly,W10004_0084,W10004_0010]
outputDir: /phg/hapPaths/
methods: CONSENSUS_mxDiv0.0001
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/SSphgDB_B73v5 host: 128.104.239.49:22 user: kathryn type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/SSphgDB_B73v5
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 71354
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 1.042750554 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 64
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.005584256 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: CONSENSUS_mxDiv0.0001 range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, sequence, seq_hash, seq_len FROM haplotypes WHERE method_id = 5;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 206856
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 71354
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 9.964170025 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested number of nodes: 206856 number of reference ranges: 71354
[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin - Filter graph on taxaList.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: creating edges from nodes.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: time: 3.1329E-5 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph number of edges: 0 number of nodes: 0 number of reference ranges: 0
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin - GetHapIdsForTaxonPlugin: finished in 11.79106549 seconds
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:24
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:24: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.
Thoughts? Zack mentioned ImportHaplotypePathFilePlugin + PathsToVCF are the next steps after this... Is there a "new" version way to accomplish these steps? Thank you!
For future reference, from Peter:
"The taxa list file has to be one taxon per line, no commas. The comma-separated list is correct on the command line, but the file has to be one taxon per line no comma." Also, save it as a .txt file.