CWL: outputBindings with secondaryFiles - actually dockstore issue
2
1
Entering edit mode
8.0 years ago
kr2 ▴ 50

Hi,

Have a CWL defined that needs to return a set of files (all filename extensions) so I have defined my outputs as:

outputs:
  mapped_out:
    type: File
    outputBinding:
      glob: $(inputs.sample).bam
    secondaryFiles:
      - .bai
      - .bas
      - .md5
      - .met
      - .maptime

I've tried a couple of variations of the json:

{
  ...
  "mapped_out": {
    "path": "/tmp/mapped.bam",
    "class": "File"
  },
  ...
}

Yeilded one file provisioned to /tmp/mapped.bam

This version (based on alea-createGenome.cwl & alea-alignReads-job.json) didn't stage anything:

{
  ...
  "mapped_out": "/tmp/mapped",
  ...
}

Everything seems to have compelted in the cwltool side:

Final process status is success
{
    "mapped_out": {
        "checksum": "sha1$53bb0c4abb07013393891cb50a3feec4c6381304", 
        "basename": "insilico_21.bam", 
        "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam", 
        "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam", 
        "secondaryFiles": [
            {
                "checksum": "sha1$ef6f2cf70e11d7d0be17b79dfb02eb1277e43b41", 
                "basename": "insilico_21.bam.bai", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bai", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bai", 
                "class": "File", 
                "size": 1370120
            }, 
            {
                "checksum": "sha1$4bf5068040c0e2a350aa21fa299f6567230bfbeb", 
                "basename": "insilico_21.bam.bas", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bas", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bas", 
                "class": "File", 
                "size": 1973
            }, 
            {
                "checksum": "sha1$4a60424144f5283c4e9cf74deb214597cac8bae8", 
                "basename": "insilico_21.bam.md5", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.md5", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.md5", 
                "class": "File", 
                "size": 32
            }, 
            {
                "checksum": "sha1$63139bed16686c6be0dd5469342af1dac8795260", 
                "basename": "insilico_21.bam.met", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.met", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.met", 
                "class": "File", 
                "size": 1521
            }, 
            {
                "checksum": "sha1$39f641f432b510034fb96b3e73569f5fc1824521", 
                "basename": "insilico_21.bam.maptime", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.maptime", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.maptime", 
                "class": "File", 
                "size": 279
            }
        ], 
        "class": "File", 
        "size": 42245405
    }
}

Any help gratefully recieved.

Thanks, Keiran

dockstore • 3.2k views
ADD COMMENT
0
Entering edit mode

Is $(inputs.sample) a string?

Why are you writing the JSON manually? Is the tool itself CWL aware and producing a cwl.output.json file?

ADD REPLY
0
Entering edit mode

I'm attempting to complete the input json file. Dockstore gives the following template:

$ dockstore tool convert entry2json --entry quay.io/wtsicgp/dockstore-cgpmap:1.0.2
{
  "reference": {
    "path": "fill me in",
    "class": "File"
  },
  "bams_in": "fill me in",
  "cram": false,
  "mapped_out": {
    "path": "fill me in",
    "class": "File"
  },
  "bwa": " -Y -K 100000000",
  "bwa_idx": {
    "path": "fill me in",
    "class": "File"
  },
  "sample": "fill me in",
  "scramble": ""
}

Can I do the same with cwltool? I can't see any options indicating this.

ADD REPLY
0
Entering edit mode

Sure, but you asked a question about the outputs section :-)

ADD REPLY
0
Entering edit mode

Is the "sure" here in reference to the question as to whether cwltool can generate an input json?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

[deleted, accidentally posted as comment]

ADD REPLY
3
Entering edit mode
8.0 years ago
denis.yuen ▴ 100

Ah, I think I understand the confusion here. Apologies since I think we created it.

1) Dockstore input JSON can (optionally) include output parameters in order to provision files to locations like S3, icgc-storage, ftp. This is an artifact of Dockstore's beginnings in the pan-cancer project where we always wrote workflows that look like "download from GNOS/S3 -> do processing -> upload to GNOS/S3"

In other words, you should be able to do this to upload bamstats_report, an output to s3:

$ cat sample_configs.json 
{
  "bam_input": {
        "class": "File",
        "path": "https://s3.amazonaws.com/oconnor-test-bucket/sample-data/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam"
    },
    "bamstats_report": {
        "class": "File",
        "path": "s3://oicr.temp/bamstats.zip"
    }
}
dockstore tool launch --entry quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0  --json sample_configs.json

And you should be able to do this to just leave the results in place on your local host

$ cat sample_configs2.json
{
  "bam_input": {
        "class": "File",
        "path": "https://s3.amazonaws.com/oconnor-test-bucket/sample-data/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam"
    }
}
$ dockstore tool launch --entry quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0  --json sample_configs2.json

This is a red herring though.

2) It looks like Dockstore has a bug/missing feature where we probably missed that output parameters (in the CWL) can also specify secondary files. While the secondary files look like they're being generated properly coming out of cwltool (in /home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.) , they aren't being moved further along to /tmp/mapped. as we would have expected.

We're adding this as an issue https://github.com/ga4gh/dockstore/issues/544

ADD COMMENT
0
Entering edit mode
8.0 years ago

Edited to add: This appears to be a dockstore specific problem, you should contact them

If you want to provide an input file with secondaryFiles copy the general format in your last code block. The checksum, location, size, and basename fields don't need to be provided.

Here is a clean YAMLy version using relative paths

mapped_out:
    class: File
    path: insilico_21.bam
    secondaryFiles
        - class: File
          path: insilico_21.bam.bai
        - class: File
          path: insilico_21.bam.bas
        - class: File
          path: insilico_21.bam.md5 
        - class: File
          path: insilico_21.bam.met
        - class: File
          path: insilico_21.bam.maptime
ADD COMMENT
0
Entering edit mode

Hmm, I think this may be a dockstore issue as it 'massages' the initial json before handing to cwltool:

Original as recommended:

$ json_pp < Dockstore3.json 
{
   "reference" : {
      "class" : "File",
      "path" : "/tmp/core_ref_GRCh37d5.tar.gz"
   },
   "bwa_idx" : {
      "path" : "/tmp/bwa_idx_GRCh37d5.tar.gz",
      "class" : "File"
   },
   "bams_in" : [
      {
         "class" : "File",
         "path" : "/tmp/insilico_21.bam"
      }
   ],
   "mapped_out" : {
      "path" : "/tmp/mapped.bam",
      "class" : "File",
      "secondaryFiles" : [
         {
            "path" : "/tmp/mapped.bam.bai",
            "class" : "File"
         },
         {
            "path" : "/tmp/mapped.bam.bas",
            "class" : "File"
         },
         {
            "path" : "/tmp/mapped.bam.md5",
            "class" : "File"
         },
         {
            "class" : "File",
            "path" : "/tmp/mapped.bam.met"
         },
         {
            "path" : "/tmp/mapped.bam.maptime",
            "class" : "File"
         }
      ]
   },
   "sample" : "insilico_21",
   "cram" : false
}

What is passed out of dockstore to cwltool:

$ json_pp < /home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/workflow_params.json
{
   "cram" : false,
   "bwa_idx" : {
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/adf6a2b5-3006-42ce-a1bb-3d084c8229f3/bwa_idx_GRCh37d5.tar.gz",
      "class" : "File"
   },
   "mapped_out" : {
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/outputs/mapped_out",
      "class" : "File"
   },
   "bams_in" : [
      {
         "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/7a8bb7c9-98c9-4dc3-82bd-131e0d798bff/insilico_21.bam",
         "class" : "File"
      }
   ],
   "reference" : {
      "class" : "File",
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/6baa8f3d-c46b-4642-8836-5ff8fb5a16b7/core_ref_GRCh37d5.tar.gz"
   },
   "sample" : "insilico_21"
}
ADD REPLY
0
Entering edit mode

mapped_out is an output, not an input -- it does not belong in your input document.

ADD REPLY

Login before adding your answer.

Traffic: 2221 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6