Tabular: compute a new column
Tool
toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1 (“Compute on rows”, historically “Add a column”). 93 step occurrences in the surveyed IWC corpus — the canonical computed-column tool. Source: $TOOLS_IUC/tools/column_maker/column_maker.xml (the toolshed owner is devteam for historical reasons; modern source lives in tools-iuc).
When to reach for it
Insert, replace, or append a column whose value is a Python expression over the existing cN columns. Multiple expressions can be sequenced inside a single tool step (each one operates on the running output of the previous). Use this for arithmetic, simple type coercion, and string concatenation.
If the row decision needs a multi-line conditional or split/gsub, prefer awk (see the awk recipe sub-pages cross-referenced from iwc-tabular-operations-survey). If columns and a row predicate are computed together, prefer tabular-sql-query.
If the computed table value is only an intermediate scalar or boolean that will be read back with param_value_from_file, keep the column_maker mechanics here but follow derive-parameter-from-file or conditional-gate-on-nonempty-result downstream. MGnify uses this shape for c1 != 0 before reading the result as a boolean.
Parameters
tool_state has three top-level fields that matter for authoring: error_handling (a <section> in the wrapper), ops (a conditional keyed on header_lines_select, holding expressions), and the optional avoid_scientific_notation boolean. expressions is a <repeat> nested under ops — flattening it to the top level produces YAML that won’t roundtrip.
-
error_handling— set as below:error_handling: auto_col_types: <see table> fail_on_non_existent_columns: true non_computable: action: --fail-on-non-computable # see enum belowfail_on_non_existent_columns: trueis uniform across all 51 corpus instances.non_computable.actionaccepts five values per the wrapper:--fail-on-non-computable(default; 49/51 in corpus),--skip-non-computable(2/51, both inconsensus-from-variation.gxwf.ymlwhere BED-coordinate arithmetic can legitimately yield non-numerics),--keep-non-computable,--non-computable-blank, and--non-computable-default(which requires anon_computable.default_valuesub-field, e.g.nan,NA,.). -
ops.header_lines_select: select with valuesyes/no. Withyes, the first input line is passed through unchanged and each expression must include anew_column_name(used as the header label). Withno, expressions must omitnew_column_nameand the first line is computed like any other. -
ops.expressions: repeat list of{ cond, add_column: { mode, pos }, new_column_name? }entries, evaluated left-to-right. Expressions can reference columns added by earlier expressions in the same step. -
add_column.mode: select with values""(Append),I(Insert),R(Replace). -
add_column.pos: 1-indexed integer (min: 1per wrapper) forIandR; the empty string""for Append (the wrapper rendersposas a hidden empty param whenmode: ""). -
avoid_scientific_notation: optional top-level boolean (defaultfalse). Settrueto force decimal output for floats; otherwise small floats render as1e-13.
The strict auto_col_types rule
auto_col_types controls whether bare cN references are coerced to numeric when the expression demands it. Corpus distribution is 48 true / 3 false. Pick by what the expression does to its cN references:
| Expression kind | auto_col_types |
|---|---|
Arithmetic on raw cN ((c18 + c19) / c6, round(...)) | true |
Pure string concatenation (c5 + '>' + c6) | false |
Arithmetic with explicit casts (int(c2) - …, float(cN)) — the expression handles its own type coercion | false |
| Mixed | split into two expressions: entries with different settings |
Rationale: with true, c5 + c6 performs numeric addition (silently turning a string concat into 0+0 if columns are non-numeric); with false and no explicit cast, c18 + c19 is string concat, which silently produces "3.13.4" instead of 6.4. Both bugs are silent. Explicit int() / float() is the third escape hatch.
Canonical exemplars to memorize:
- Arithmetic on raw
cN,auto_col_types: true— the SARS-CoV-2 variation reporting workflow computesAF = round((c18 + c19) / c6, 6)and insertsAFcaller. - String concat,
auto_col_types: false— the same workflow appendschangeandchange_with_posfrom string concatenation expressions. - Explicit-cast arithmetic,
auto_col_types: false— the SARS-CoV-2 consensus-from-variation workflow usesint(...)expressions with--skip-non-computable.
Idiomatic shapes
Insert + replace in one step, arithmetic on raw cN:
tool_id: toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1
tool_state:
error_handling:
auto_col_types: true
fail_on_non_existent_columns: true
non_computable:
action: --fail-on-non-computable
ops:
header_lines_select: yes
expressions:
- cond: c7
add_column:
mode: I # insert
pos: "8"
new_column_name: AFcaller
- cond: round((c18 + c19) / c6, 6)
add_column:
mode: R # replace
pos: "7"
new_column_name: AF
Anchored by the SARS-CoV-2 variation reporting IWC exemplar.
String-concat new columns (append), auto_col_types: false:
tool_id: toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1
tool_state:
error_handling:
auto_col_types: false
fail_on_non_existent_columns: true
non_computable:
action: --fail-on-non-computable
ops:
header_lines_select: yes
expressions:
- cond: c5 + '>' + c6
add_column:
mode: ""
pos: ""
new_column_name: change
- cond: c3 + ':' + c19
add_column:
mode: ""
pos: ""
new_column_name: change_with_pos
Anchored by the SARS-CoV-2 variation reporting IWC exemplar.
Pitfalls
- Wrong
auto_col_types. See the table above — the failure mode is silent and downstream-only. - Flattening
expressions:totool_statetop level. The actual nesting istool_state.ops.expressionswithtool_state.ops.header_lines_selectas a sibling.error_handlingandavoid_scientific_notationare siblings ofops, not nested inside it. Flat shapes don’t roundtrip. new_column_namepaired withheader_lines_select: no. The wrapper only exposesnew_column_nameinside theheader_lines_select: yesbranch. Authoring a step withheader_lines_select: noANDnew_column_nameproduces YAML the tool form won’t accept.- Mixing arithmetic and string concat in one entry. Split into two
expressions:entries with differentauto_col_typesrather than reaching forstr(...)inside the expression. - Skipping
error_handling. Wrapper defaults differ from corpus defaults; without explicitfail_on_non_existent_columns: true, non-existent column refs may pass through depending onnon_computable.action. add_column.mode/pos.I(insert) andR(replace) take a 1-indexed integerpos; append usesmode: ""andpos: "". Off-by-one inposshifts every downstreamcNreference in subsequent expressions in the same step.Add a column(addValue/1.0.1) — a different, legacy tool that adds a constant column only. Do not confuse withcolumn_maker/Add_a_column1.
Legacy alternative
toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.1 (“Add a column”, 56 step occurrences) is the legacy constant-column-only tool — heavily used in older VGP workflows. For new work, prefer column_maker/Add_a_column1 even for constant columns; it carries the same error_handling story and unifies one tool.
See also
- iwc-tabular-operations-survey — corpus survey, §7 decision record for the
auto_col_typesrule. - iwc-parameter-derivation-survey — compute-then-parameterize seam.
- tabular-cut-and-reorder-columns — pure column projection without computation.
- tabular-sql-query — when project + compute + filter need to fuse.
- derive-parameter-from-file — when a one-value dataset must become a typed runtime parameter.