Tabular: concatenate collection to table

Tool

toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0. The tabular survey found 44 step instances, making this the dominant collection-to-tabular bridge for row-binding per-element tabular outputs into one dataset.

When to reach for it

Use this when a Galaxy dataset collection of tabular-like files must become one tabular dataset for downstream Cut1, Filter1, datamash_ops, tp_find_and_replace, or reporting steps.

Do not use this for plain two-file concatenation; use tp_cat or legacy cat1 only when the input is not a collection. Do not use this when each collection element should become a column; use tabular-pivot-collection-to-wide. For grouped collapse within one table, use tabular-group-and-aggregate-with-datamash.

Parameters

input_list: connected collection input.
filename.add_name: whether to inject the collection element identifier into output rows.
filename.place_name: where/how to place the element identifier when add_name: true.
one_header: whether to keep only one header row across all collection elements.

The canonical headered-table shape is:

tool_id: toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0
tool_state:
  filename:
    add_name: true
    place_name: same_multiple
  input_list: { __class__: ConnectedValue }
  one_header: true

Idiomatic shapes

Per-sample tabulars to one annotated table:

tool_state:
  filename:
    add_name: true
    place_name: same_multiple
  input_list: { __class__: ConnectedValue }
  one_header: true

Anchored by the SARS-CoV-2 variation reporting IWC exemplar.

Collection concat with no element identifier:

tool_state:
  filename:
    add_name: false
  input_list: { __class__: ConnectedValue }
  one_header: false

Anchored by the MAPseq-to-ampvis2 IWC exemplar.

Headerless outputs with row provenance:

tool_state:
  filename:
    add_name: true
    place_name: same_multiple
  input_list: { __class__: ConnectedValue }
  one_header: false

Anchored by the influenza consensus and subtyping IWC exemplar.

Pitfalls

add_name: false loses provenance. This is silent if downstream needs sample or element identity.
one_header: false duplicates headers when each collection element has its own header row.
one_header: true on headerless data may drop a real first row. Only enable it when inputs have headers.
place_name: same_multiple is the row-provenance idiom. It repeats the element name so each output row carries identity.
same_once is a different shape. Use it only when downstream expects block labels, not per-row identity.
This is row-bind, not wide pivot. If each collection element should become a column, use tabular-pivot-collection-to-wide.

Legacy alternative

For non-collection two-file concatenation, older workflows may use tp_cat or legacy core cat1. Those are not replacements for this pattern because they do not carry collection element identity.

Tabular: concatenate collection to table

Pattern health

Tabular: concatenate collection to table

Tool

When to reach for it

Parameters

Idiomatic shapes

Pitfalls

Legacy alternative

See also

IWC Exemplars

Incoming References (13)