Home Pattern

Tabular: to collection by row

Use split_file_to_collection split_by:col to fan a tabular into collection elements by row/key.

draft pattern

Revised: 2026-05-03
Rev: 2
Patterns: tabular-concatenate-collection-to-table
sync-collections-by-identifier
Molds: implement-galaxy-tool-step
Related: iwc-transformations-survey
nextflow-to-galaxy-channel-shape-mapping

Pattern health

warn

IWC exemplar anchors
3 abstract workflow anchors declared.
Foundry verification fixture
No structural verification fixture yet.
Pattern map coverage
2 pattern maps link here.
Metadata contract
Pattern frontmatter matches the site contract.

Tabular: to collection by row

Tool

Use toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2.

The IWC-attested shape is split_by: col: split a tabular file into a dataset collection, one element per row or key. id_col chooses the column that becomes the element identifier. match_regex and sub_regex clean or extract that identifier.

This is the inverse of tabular-concatenate-collection-to-table, where collapse_dataset turns a collection into one tabular dataset.

When to reach for it

Use this when a single manifest, sample sheet, accession list, or combined results table must become a collection so the next tool can map over each row/key.

Do not use this when the input is already a collection. Do not use this for collection-to-table row-binding; use tabular-concatenate-collection-to-table.

Parameters

tool_id: toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2
tool_state:
  input: { __class__: ConnectedValue }
  split_parms:
    split_by: col
    id_col: "1"
    match_regex: (.*)
    sub_regex: \1

split_parms.split_by: col: split by a tabular column.
split_parms.id_col: 1-indexed column whose value becomes the collection element identifier.
split_parms.match_regex / sub_regex: regex and replacement used to produce the final identifier.

Pitfalls

id_col is 1-indexed.
Pick a stable, unique identifier column; duplicate values produce ambiguous collection elements.
Regex cleanup is downstream metadata cleanup, not cosmetic only.
Headers matter. Ensure each split element gets the header behavior the downstream tool expects.

See also

iwc-transformations-survey — Recipe J and candidate boundary.
tabular-concatenate-collection-to-table — inverse operation using collapse_dataset.
sync-collections-by-identifier — downstream collection alignment by element identifiers.

IWC exemplars3 anchors

IWC Exemplars

data-fetching/sra-manifest-to-concatenated-fastqs/sra-manifest-to-concatenated-fastqshigh

Splits one-column SRA accessions so fasterq_dump can run once per accession.

sars-cov-2-variant-calling/sars-cov-2-variation-reporting/variation-reportinghigh

Splits a combined per-clade VCF table into per-clade collection elements.

epigenetics/consensus-peaks/consensus-peaks-chip-srhigh

Turns a sample-list tabular into a collection-shaped input for downstream processing.

Incoming References (7)

Galaxy: collection patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy collection transformation patterns.
Galaxy: tabular patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy tabular transformation patterns.
Manifest to mapped collection lifecyclerelated pattern— Use a manifest or table to build a collection, map a tool per row, then relabel or reshape outputs.
Iwc Map Over Lifecycle Surveyrelated pattern— Survey of IWC map-over lifecycle recipes, with a Nextflow-to-Galaxy crosswalk for collection construction, cleanup, reshape, reduce, and publish phases.
Iwc Transformations Surveyrelated note— Corpus survey of collection-shape transformations across IWC: built-in collection ops, toolshed transformers, and the multi-step recipes that bracket map-over.
Nextflow-to-Galaxy channel shape mappingrelated note— Maps common Nextflow channel, tuple, and path shapes to Galaxy dataset and collection shapes.
Nextflow: samplesheet rows to Galaxy collectionsimplemented_by_patterns— Route Nextflow samplesheet row streams and repeated tuple inputs to Galaxy list, paired, or list:paired collections.