Home Pattern

Collection: cleanup after map-over failure

Use FILTER_EMPTY or FILTER_FAILED after map-over when bad elements would break downstream collection steps.

draft pattern
Revised
2026-05-03
Rev
2

Pattern health

warn
  • IWC exemplar anchors

    3 abstract workflow anchors declared.

  • Foundry verification fixture

    No structural verification fixture yet.

  • Pattern map coverage

    2 pattern maps link here.

  • Metadata contract

    Pattern frontmatter matches the site contract.

Collection: cleanup after map-over failure

Tool

Use Galaxy built-in collection filters:

  • __FILTER_EMPTY_DATASETS__ drops or replaces elements whose datasets are empty.
  • __FILTER_FAILED_DATASETS__ drops or replaces elements whose jobs failed.

These are content/state cleanup gates. They are not identifier-list filters; use sync-collections-by-identifier when the keep/drop set comes from collection element names.

When to reach for it

Use this immediately after a tool maps over a collection and some elements may be unusable for the next step.

Use __FILTER_EMPTY_DATASETS__ when a per-element tool succeeds but produces a zero-line or zero-byte dataset. This is common after tabular, awk, BED, or search-style extraction where “no hits” is a valid per-sample result but the downstream tool expects non-empty input.

Use __FILTER_FAILED_DATASETS__ when a per-element job may fail and downstream processing should continue on the successful elements.

Use the replacement form when downstream shape must stay stable. A sentinel dataset keeps an element slot instead of shortening the collection.

Parameters

  • input: collection to inspect.
  • replacement: optional connected dataset. If omitted, bad elements are removed. If supplied, bad elements are replaced with this dataset.

The simple drop form has no meaningful knobs beyond choosing empty vs failed. The authoring decision is which failure mode you are guarding and whether collection length must be preserved.

Idiomatic shapes

Drop empty elements before the next collection consumer:

tool_id: __FILTER_EMPTY_DATASETS__
tool_state:
  input: { __class__: ConnectedValue }

Drop failed elements before aggregation:

tool_id: __FILTER_FAILED_DATASETS__
tool_state:
  input: { __class__: ConnectedValue }

Replace failed elements with a sentinel dataset:

tool_id: __FILTER_FAILED_DATASETS__
in:
  - id: input
    source: argNorm on Groot output
  - id: replacement
    source: _unlabeled_step_8/outfile

Pitfalls

  • Empty is not failed. Use __FILTER_EMPTY_DATASETS__ for successful empty files and __FILTER_FAILED_DATASETS__ for red elements.
  • Dropping changes collection length. If a downstream zip or sibling comparison assumes one-to-one alignment, resync siblings or use a replacement sentinel.
  • Replacement changes data semantics. The sentinel becomes real downstream input, so choose a value the next tool treats as controlled no-result data.
  • Do not confuse this with __FILTER_FROM_FILE__; that tool filters by identifier list and does not inspect dataset state.

See also

IWC exemplars3 anchors

IWC Exemplars

microbiome/pathogen-identification/pathogen-detection-pathogfair-samples-aggregation-and-visualisation/Pathogen-Detection-PathoGFAIR-Samples-Aggregation-and-Visualisationhigh

Shows multiple collection inputs passing through FILTER_FAILED before downstream aggregation.

amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-rrna-prediction/mgnify-amplicon-pipeline-v5-rrna-predictionhigh

Uses repeated FILTER_EMPTY steps after awk/search reshapes before the next consumer.

microbiome/metagenomic-raw-reads-amr-analysis/metagenomic-raw-reads-amr-analysismedium

Shows rare replacement form that preserves shape with a sentinel file.

Incoming References (13)