Home Pattern

Tabular: filter rows by column value

Use Filter1 with a Python expression over cN columns to drop rows. Highest-frequency tabular row filter in IWC.

Revised
2026-05-03
Rev
2

Pattern health

warn
  • IWC exemplar anchors

    3 abstract workflow anchors declared.

  • Foundry verification fixture

    No structural verification fixture yet.

  • Pattern map coverage

    1 pattern map link here.

  • Metadata contract

    Pattern frontmatter matches the site contract.

Tabular: filter rows by column value

Tool

Filter1 (Galaxy core; bundled, no toolshed owner). Display name “Filter data on any column using simple expressions”. Source: $GALAXY/tools/stats/filtering_1_1_0.xml.

When to reach for it

One-shot row filter on tabular input. The row predicate is a single Python expression over c1, c2, … column references; the output preserves column order and types. By far the most common row-filter idiom in IWC (33 step occurrences in the surveyed corpus, second only to regex grep variants).

If the predicate needs join/sort/grouping semantics, switch to tabular-sql-query or to datamash_ops. If the predicate is a substring/regex over a single column or whole line, prefer tabular-filter-by-regex.

Parameters

  • cond: text param, evaluated per row as a Python expression over c1, c2, … (1-indexed) column references. Comparison, boolean, and arithmetic operators all work; function calls (len, range, etc.) and .split(...) are supported per the wrapper help. Quote string literals. Boolean operators must be lowercaseOR/AND are name lookups, not boolean ops.
  • header_lines: integer (the workflow YAML conventionally quotes it: "0", "1"). Set to "1" whenever the input has a header row. Filter1 evaluates cond per row; rows whose evaluation raises an exception are skipped from output and counted in the dataset’s “Condition/data issue” metadata (visible in the history-item info panel — not silent in the dataset metadata sense, but invisible in the data flow).
  • The connected input port is the tabular dataset.

Idiomatic shapes

Single-column equality, header-aware:

tool_id: Filter1
tool_state:
  cond: c4=='PASS' or c4=='.'
  header_lines: '1'

Anchored by the SARS-CoV-2 variation reporting IWC exemplar.

Predicate produced upstream and wired in via ConnectedValue (lets a workflow author parameterize the predicate without a runtime parameter). Note that this corpus example has header_lines: "0" because the upstream rule already accounts for the header — the header_lines setting is independent of the cond source:

tool_id: Filter1
tool_state:
  cond: { __class__: ConnectedValue }   # from a prior "generate filter rule" step
  header_lines: '0'

Anchored by the consensus peaks ATAC/CUT&RUN IWC exemplar.

Pitfalls

  • Forgetting header_lines. Default "0". With a header row present, the header is evaluated like any other row; under a numeric comparison the header typically raises and is skipped, while a string-comparison header may pass through. Either way, off-by-one assertions downstream.
  • String quoting. cond: c4==PASS parses as a name lookup on PASS, not a string comparison. Always quote literal strings.
  • Uppercase operators. cond: c1=='X' OR c1=='Y'OR is treated as a name (NameError), not a boolean op. Use lowercase and, or, not.
  • No new columns. Filter1 is a filter; new or computed columns belong in tabular-compute-new-column.
  • Implicit column-type coercion. cN values are strings until an operator forces int/float; coercion failures skip the row (counted in the dataset metadata, not surfaced in the output stream). If silent-from-data drops are unacceptable, pre-clean upstream or use tabular-sql-query.

All three corpus filters are equality / disjunction over a string column, or a ConnectedValue rule. Numeric-range predicates are unattested — if you reach for cN > X, you’re slightly off the corpus path; verify behavior on a sample.

See also

IWC exemplars3 anchors

IWC Exemplars

sars-cov-2-variant-calling/sars-cov-2-variation-reporting/variation-reportinghigh

Shows a literal Filter1 predicate over a string status column with one header line.

sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variationhigh

Shows a connected predicate generated upstream with header_lines set independently.

epigenetics/consensus-peaks/consensus-peaks-atac-cutandrunhigh

Shows another connected predicate shape where the filter rule is supplied at runtime.

Incoming References (7)

  • Parameter: compose runtime text parameterrelated pattern— Use compose_text_param to build connected text expressions from constants plus runtime scalar values.
  • Galaxy: tabular patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy tabular transformation patterns.
  • Tabular: filter rows by regexrelated pattern— Use tp_grep_tool for whole-line regex row filters on tabular input. Grep1 is the legacy alternative.
  • Tabular: SQL queryrelated pattern— Use query_tabular when SQL semantics justify it: windows, joins, anti-joins, or fused project+compute over tabulars.
  • Iwc Parameter Derivation Surveyrelated pattern— Corpus survey of Galaxy workflow recipes that turn upstream data, metadata, or small files into runtime parameters.
  • Iwc Tabular Operations Surveyrelated note— Corpus survey of tabular tools and operations across IWC workflows; map for the operation pattern hierarchy on row/column data manipulation.
  • Nextflow-to-Galaxy channel shape mappingrelated note— Maps common Nextflow channel, tuple, and path shapes to Galaxy dataset and collection shapes.