Tabular: relabel by row counter
Tool
toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool. This is the operation-named awk recipe for replacing row content with deterministic labels derived from NR.
When to reach for it
Use this when a workflow needs synthetic row labels like sample_0, sample_1, … or sample_1, sample_2, … based only on input row order.
Do not use this for simple computed columns; use tabular-compute-new-column. Do not use this when labels should come from collection element identifiers; use collection-aware provenance such as tabular-concatenate-collection-to-table.
Parameters
code: awk program. A one-line quoted string is enough for this operation.infile: connected input.variables: usually[]in awk examples.
The corpus exemplar uses toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.5+galaxy3.
Idiomatic shapes
Corpus-observed sample_0 first row:
tool_id: toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/9.5+galaxy3
tool_state:
code: '{gsub( $0 ,"sample_" (NR-1)); print}'
infile: { __class__: ConnectedValue }
variables: []
Anchored by the MAGs binning evaluation IWC exemplar.
Clearer equivalent for new whole-row replacement:
tool_state:
code: '{print "sample_" (NR-1)}'
infile: { __class__: ConnectedValue }
variables: []
One-based labels:
tool_state:
code: '{print "sample_" NR}'
infile: { __class__: ConnectedValue }
variables: []
Keep a header and relabel only data rows:
tool_state:
code: 'NR==1 {print; next} {print "sample_" (NR-1)}'
infile: { __class__: ConnectedValue }
variables: []
Pitfalls
- Zero-based vs one-based labels. Corpus emits
sample_0first because it usesNR-1. UseNRforsample_1. - Headers shift counters. Decide whether the first line is preserved, dropped, or counted before picking
NRvsNR-1. - Avoid
gsub($0, ...)for new work. It works in the exemplar, but treats the current row as a regex pattern.print "sample_" ...is clearer when replacing the whole row. - Whole-row replacement destroys original columns. Only use when downstream wants labels only.
- Row order becomes semantic. Any upstream sort or filter changes assigned sample numbers.
- No collection provenance. If labels should come from element identifiers, do not synthesize them from row order.
See also
- iwc-tabular-operations-survey — §2g and §7 awk split decision.
- tabular-prepend-header — header-preserving variants.
- tabular-compute-new-column — preserve rows while adding/replacing columns.
- tabular-concatenate-collection-to-table — preserve collection element identity instead of synthesizing row labels.