PLAN_CWL_STDIN_REDIRECT

Plan: Fix CWL stdin redirection (stdin type shortcut and general)

Problem

CWL conformance test stdin_shorcut (v1.1) fails — output is empty (size 0). The tool uses file1: stdin shortcut syntax, but the underlying bug affects all CWL stdin redirection, not just the shortcut form.

cwltool desugars file1: stdinfile1: File + stdin: $(inputs.file1.path) at load time (load_tool.py:179-194), so both forms are identical by the time Galaxy sees them. The real issue is that Galaxy doesn’t resolve the stdin path to the actual staged file location.

Test tool

# wc-tool-shortcut.cwl (v1.1)
class: CommandLineTool
requirements:
  - class: DockerRequirement
    dockerPull: debian:stretch-slim
inputs:
  file1: stdin          # desugared to file1: File + stdin: $(inputs.file1.path)
outputs:
  output:
    type: File
    outputBinding: { glob: output }
baseCommand: [wc]
stdout: output

Compare with long form (wc-tool.cwl) which uses stdin: $(inputs.file1.path) explicitly — same underlying mechanism.

How cwltool handles stdin

At job creation (command_line_tool.py:1012-1022)

stdin_eval = builder.do_eval(self.tool["stdin"])  # → "/some/tmp/path/file1"
j.stdin = stdin_eval
if j.stdin:
    reffiles.append({"class": "File", "path": j.stdin})  # adds to pathmapper

The evaluated expression returns an absolute path to where cwltool’s builder placed the input file. This path is then added to the pathmapper’s reffiles so it gets staged.

At execution (job.py:286-292)

stdin_path = None
if self.stdin is not None:
    rmap = self.pathmapper.reversemap(self.stdin)  # logical → staged path
    if rmap is None:
        raise WorkflowException(f"{self.stdin} missing from pathmapper")
    else:
        stdin_path = rmap[1]  # use STAGED path, not logical path

Key: cwltool does a pathmapper reverse lookup to convert the logical stdin path to the actual staged file path before using it.

How Galaxy handles stdin (and where it breaks)

JobProxy.stdin (parser.py:576-580)

@property
def stdin(self):
    if self.is_command_line_job:
        return self.cwl_job().stdin  # raw path from expression evaluation
    else:
        return None

Returns whatever cwltool’s expression evaluator produced — an absolute path in cwltool’s temporary space, NOT the Galaxy staging directory.

_cwl_build_command_line() (evaluation.py:1354-1366)

cwl_stdin = cwl_job_proxy.stdin
# ...
if cwl_stdin:
    command_line += f' < "{cwl_stdin}"'  # uses raw path directly!

Galaxy appends the unresolved path to the command line. This path points to cwltool’s temporary directory, not where Galaxy actually staged the input file. At execution time, the file isn’t there → empty stdin → empty output.

Contrast with stdout/stderr

cwl_stdout and cwl_stderr work because they’re just filenames (e.g. output, stderr.txt), not paths requiring resolution. They’re relative to the working directory, which Galaxy sets up correctly. stdin is different — it’s an input file path that must resolve to a real file.

Fix

Approach: resolve stdin through pathmapper in JobProxy

The pathmapper is available on the cwl_job object. Do the same reverse lookup cwltool does.

Change 1: lib/galaxy/tool_util/cwl/parser.pyJobProxy.stdin

@property
def stdin(self):
    if self.is_command_line_job:
        cwl_job = self.cwl_job()
        stdin_path = cwl_job.stdin
        if stdin_path and hasattr(cwl_job, "pathmapper"):
            rmap = cwl_job.pathmapper.reversemap(stdin_path)
            if rmap:
                return rmap[1]  # staged path
        return stdin_path
    else:
        return None

This mirrors cwltool’s job.py:286-292 logic. The reversemap() returns (type, staged_path) — we want [1] (the staged path).

Alternative: resolve in evaluation.py

Could also do the resolution in _cwl_build_command_line() instead of in JobProxy.stdin. But resolving in the proxy keeps the abstraction cleaner — callers of stdin always get a usable path.

Change 2: verify staging includes stdin file

cwltool adds stdin to reffiles (command_line_tool.py:1022) which feeds into the pathmapper. Galaxy calls process.stage_files(cwl_job.pathmapper, ...) at parser.py:719, which should already stage pathmapper entries. Verify that the stdin file actually appears in cwl_job.pathmapper entries.

If it doesn’t, Galaxy may need to explicitly add it:

# In stage_files(), before calling process.stage_files():
if cwl_job.stdin:
    # Ensure stdin file is in the pathmapper for staging
    ...

But this is likely unnecessary — cwltool adds it to reffiles before creating the pathmapper, so it should already be there.

Testing

Red-to-green for stdin_shorcut (v1.1):

# Currently in RED_TESTS, requires Docker for this specific test tool
# Run via integration test harness or directly:
source .venv/bin/activate
PYTHONPATH=lib:$PYTHONPATH pytest -s -v \
  lib/galaxy_test/api/cwl/test_cwl_conformance_v1_1.py \
  -k stdin_shorcut

Also verify:

Debugging approach

Add temporary logging to JobProxy.stdin to see:

  1. What cwl_job().stdin returns (the raw path)
  2. What cwl_job.pathmapper.reversemap() returns
  3. Whether the stdin file exists at the staged path
@property
def stdin(self):
    if self.is_command_line_job:
        cwl_job = self.cwl_job()
        stdin_path = cwl_job.stdin
        log.debug(f"CWL stdin raw path: {stdin_path}")
        if stdin_path and hasattr(cwl_job, "pathmapper"):
            rmap = cwl_job.pathmapper.reversemap(stdin_path)
            log.debug(f"CWL stdin reversemap: {rmap}")
            if rmap:
                log.debug(f"CWL stdin staged path exists: {os.path.exists(rmap[1])}")
                return rmap[1]
        return stdin_path
    else:
        return None

Unresolved questions