Collection: split identifier via rules
Tool
Use __APPLY_RULES__ to turn a flat list into a nested list:list by splitting each element identifier into two parts.
When to reach for it
Use this when identifiers encode two nesting axes in one string, such as sampleA_rep1, and downstream tools need sampleA -> rep1 nesting.
Do not use this for swapping two existing nesting levels; use collection-swap-nesting-with-apply-rules. Do not use this to make forward/reverse pairs; use collection-build-list-paired-with-apply-rules when one parsed axis is a paired-end role.
This page is about deriving list nesting from one identifier. Use regex-relabel-via-tabular when the collection shape is already right and only labels need cleanup.
Parameters
The corpus shape uses two parallel add_column_regex rules, each with one capture result. Do not encode this as one group_count: 2 rule when following the IWC exemplar.
Conceptual Apply Rules shape:
tool_id: __APPLY_RULES__
tool_state:
rules:
- type: add_column_metadata
value: identifier0
- type: add_column_regex
target_column: 0
expression: "^(.*)_([^_]*)$"
replacement: "\\1"
- type: add_column_regex
target_column: 0
expression: "^(.*)_([^_]*)$"
replacement: "\\2"
mapping:
list_identifiers: [1, 2]
Pitfalls
- Use two regex rules, not one
group_count: 2rule, for corpus parity. - Target the original identifier column both times.
^(.*)_([^_]*)$splits on the last underscore; use a stricter regex if identifiers can contain multiple separators.- Validate unmatched behavior instead of silently creating empty nesting keys.
See also
- iwc-transformations-survey — Apply Rules Shape B and candidate boundary.
- galaxy-apply-rules-dsl —
add_column_regexandlist_identifiersdetails. - collection-swap-nesting-with-apply-rules — regroup existing
list:listaxes. - collection-build-list-paired-with-apply-rules — paired-end variant.