Home Pattern

Collection: split identifier via rules

Use Apply Rules regex columns to split one collection identifier into nested list identifiers.

draft pattern

Revised: 2026-05-03
Rev: 2
Patterns: collection-swap-nesting-with-apply-rules
collection-build-list-paired-with-apply-rules
Molds: implement-galaxy-tool-step
Related: iwc-transformations-survey
galaxy-apply-rules-dsl

Pattern health

warn

IWC exemplar anchors
1 abstract workflow anchor declared.
Foundry verification fixture
No structural verification fixture yet.
Pattern map coverage
1 pattern map link here.
Metadata contract
Pattern frontmatter matches the site contract.

Collection: split identifier via rules

Tool

Use __APPLY_RULES__ to turn a flat list into a nested list:list by splitting each element identifier into two parts.

When to reach for it

Use this when identifiers encode two nesting axes in one string, such as sampleA_rep1, and downstream tools need sampleA -> rep1 nesting.

Do not use this for swapping two existing nesting levels; use collection-swap-nesting-with-apply-rules. Do not use this to make forward/reverse pairs; use collection-build-list-paired-with-apply-rules when one parsed axis is a paired-end role.

This page is about deriving list nesting from one identifier. Use regex-relabel-via-tabular when the collection shape is already right and only labels need cleanup.

Parameters

The corpus shape uses two parallel add_column_regex rules, each with one capture result. Do not encode this as one group_count: 2 rule when following the IWC exemplar.

Conceptual Apply Rules shape:

tool_id: __APPLY_RULES__
tool_state:
  rules:
    - type: add_column_metadata
      value: identifier0
    - type: add_column_regex
      target_column: 0
      expression: "^(.*)_([^_]*)$"
      replacement: "\\1"
    - type: add_column_regex
      target_column: 0
      expression: "^(.*)_([^_]*)$"
      replacement: "\\2"
  mapping:
    list_identifiers: [1, 2]

Pitfalls

Use two regex rules, not one group_count: 2 rule, for corpus parity.
Target the original identifier column both times.
^(.*)_([^_]*)$ splits on the last underscore; use a stricter regex if identifiers can contain multiple separators.
Validate unmatched behavior instead of silently creating empty nesting keys.

See also

iwc-transformations-survey — Apply Rules Shape B and candidate boundary.
galaxy-apply-rules-dsl — add_column_regex and list_identifiers details.
collection-swap-nesting-with-apply-rules — regroup existing list:list axes.
collection-build-list-paired-with-apply-rules — paired-end variant.

IWC exemplars1 anchor

IWC Exemplars

epigenetics/average-bigwig-between-replicates/average-bigwig-between-replicateshigh

Splits flat bigWig identifiers into sample-prefix and replicate-suffix nesting with two regex-derived columns.

Incoming References (7)

Collection: build list paired with Apply Rulesrelated pattern— Use Apply Rules to promote identifier columns into a list:paired collection, with optional cleanup first.
Collection: swap nesting with Apply Rulesrelated pattern— Use Apply Rules to regroup a list:list collection by swapping outer and inner identifier columns.
Galaxy: collection patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy collection transformation patterns.
Reshape, relabel, and remap by collection axisrelated pattern— Use Apply Rules and deterministic relabeling when domain fan-out creates the wrong map-over axis.
Galaxy Apply Rules DSLrelated note— Reference for Galaxy's Apply Rules DSL: rule operations, mapping operations, composition patterns, pitfalls.
Iwc Transformations Surveyrelated note— Corpus survey of collection-shape transformations across IWC: built-in collection ops, toolshed transformers, and the multi-step recipes that bracket map-over.
Nextflow: grouped channel to regrouped Galaxy collectionimplemented_by_patterns— Route Nextflow groupTuple, transpose, and grouped tuple payloads to Galaxy collection reshape patterns when the key is a real axis.