Home Pattern

Tabular: SQL query

Use query_tabular when SQL semantics justify it: windows, joins, anti-joins, or fused project+compute over tabulars.

Revised
2026-05-03
Rev
2

Pattern health

warn
  • IWC exemplar anchors

    4 abstract workflow anchors declared.

  • Foundry verification fixture

    No structural verification fixture yet.

  • Pattern map coverage

    1 pattern map link here.

  • Metadata contract

    Pattern frontmatter matches the site contract.

Tabular: SQL query

Tool

toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2 (“Query Tabular”). 16 step occurrences in the surveyed IWC corpus. This is the SQL operation in the tabular hierarchy: powerful, but deliberately narrow.

When to reach for it

Use query_tabular when the operation is genuinely SQL-shaped: window functions, multi-table joins, anti-joins, or fused project + compute + filter over one or more tabular inputs.

For a one-table Python predicate, use tabular-filter-by-column-value. For whole-line regex filtering, use tabular-filter-by-regex. For pure projection, use tabular-cut-and-reorder-columns. For a simple computed column, use tabular-compute-new-column.

Parameters

  • sqlquery: main SQL query. Tables are named t1, t2, … unless tables[].tbl_opts.table_name sets explicit names.
  • query_result.header: yes / no; controls whether the main output has a header.
  • tables: repeat list of inputs to load into SQLite.
  • tables[].table: connected tabular input.
  • tables[].input_opts.linefilters: load-time preprocessing before SQLite import. Corpus filters include comment, skip, prepend_dataset_name, prepend_line_num, and normalize.
  • tables[].tbl_opts.table_name: optional SQL table name. Blank means default t1, t2, …
  • tables[].tbl_opts.column_names_from_first_line: whether the first row supplies column names.
  • tables[].tbl_opts.col_names: comma-separated named columns for SQL.
  • tables[].tbl_opts.load_named_columns: whether to load named columns rather than positional cN columns.
  • tables[].tbl_opts.indexes: optional SQLite indexes for join-heavy queries.
  • addqueries.queries: optional extra SQL outputs. Check this before assuming sqlquery is the only result.
  • workdb: usually workdb.sqlite.

Idiomatic shapes

Single-table project + compute with a window function:

tool_id: toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2
tool_state:
  query_result:
    header: no
  sqlquery: |-
    SELECT c1, c2, c3, c3 * 100 / SUM(c3) OVER() AS relative_abundance
    FROM t1;
  tables:
    - table: { __class__: ConnectedValue }
      input_opts:
        linefilters:
          - filter:
              filter_type: comment
              comment_char: "35"
          - filter:
              filter_type: prepend_dataset_name
      tbl_opts:
        table_name: ""
        column_names_from_first_line: false
        col_names: ""
        load_named_columns: false
        indexes: []
  workdb: workdb.sqlite

Anchored by the MAPseq-to-ampvis2 IWC exemplar.

Multi-table SQL join with named tables:

tool_id: toolshed.g2.bx.psu.edu/repos/iuc/query_tabular/query_tabular/3.3.2
tool_state:
  query_result:
    header: yes
    header_prefix: ""
  sqlquery: |-
    SELECT pep.mpep, prot.prot
    FROM pep
    INNER JOIN prot on pep.mpep=prot.pep
  tables:
    - table: { __class__: ConnectedValue }
      input_opts:
        linefilters: []
      tbl_opts:
        table_name: pep
        column_names_from_first_line: false
        col_names: mpep
        load_named_columns: false
        indexes: []
    - table: { __class__: ConnectedValue }
      input_opts:
        linefilters: []
      tbl_opts:
        table_name: prot
        column_names_from_first_line: false
        col_names: pep,prot
        load_named_columns: false
        indexes: []
  workdb: workdb.sqlite

Anchored by the clinical metaproteomics verification IWC exemplar.

Pitfalls

  • Do not use SQL for simple filters or cuts. The survey decision keeps this page narrow; simple row predicates and projections have clearer operation pages.
  • Header handling is split across fields. query_result.header, tables[].tbl_opts.column_names_from_first_line, and load filters like skip are independent.
  • Line filters run before SQL. prepend_dataset_name, prepend_line_num, and normalize change column positions before the query sees the table.
  • Blank table_name is meaningful. Blank means default t1; named joins need explicit table names.
  • comment_char: "35" means #. The corpus YAML uses the ASCII code string, not the literal #.
  • Extra outputs can hide in addqueries. Some workflows emit multiple query results from one step.
  • Version pins vary. 3.3.0 and 3.3.2 appear in corpus. Prefer the newest available pin unless preserving an existing workflow.

See also

IWC exemplars4 anchors

IWC Exemplars

amplicon/amplicon-mgnify/mapseq-to-ampvis2/mapseq-to-ampvis2high

Uses a single-table query with SUM window function for relative abundance.

amplicon/amplicon-mgnify/mgnify-amplicon-pipeline-v5-rrna-prediction/mgnify-amplicon-pipeline-v5-rrna-predictionhigh

Shows one input with main query plus multiple addqueries outputs.

proteomics/clinicalmp/clinicalmp-verification/clinicalmp-verificationhigh

Shows a two-table INNER JOIN with named tables.

proteomics/clinicalmp/clinicalmp-discovery/iwc-clinicalmp-discovery-workflowhigh

Shows a three-table anti-join with load filters and indexes.

Incoming References (9)

  • Galaxy: tabular patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy tabular transformation patterns.
  • Tabular: compute a new columnrelated pattern— Use column_maker (Add_a_column1) with strict error_handling to insert/replace a computed column. Per-expression-kind auto_col_types rule.
  • Tabular: cut and reorder columnsrelated pattern— Use Cut1 with a comma-separated cN list to project — and reorder — columns. Listing out of order is the canonical reorder idiom.
  • Tabular: filter rows by column valuerelated pattern— Use Filter1 with a Python expression over cN columns to drop rows. Highest-frequency tabular row filter in IWC.
  • Tabular: filter rows by regexrelated pattern— Use tp_grep_tool for whole-line regex row filters on tabular input. Grep1 is the legacy alternative.
  • Tabular: group and aggregaterelated pattern— Use datamash_ops for grouped tabular aggregation: multi-column grouping, collapse, countunique, min/max, and reductions.
  • Tabular: join on keyrelated pattern— Use tp_easyjoin_tool for two-tabular key joins; use tp_multijoin_tool for many files and query_tabular for SQL joins.
  • Iwc Tabular Operations Surveyrelated note— Corpus survey of tabular tools and operations across IWC workflows; map for the operation pattern hierarchy on row/column data manipulation.
  • Nextflow-to-Galaxy channel shape mappingrelated note— Maps common Nextflow channel, tuple, and path shapes to Galaxy dataset and collection shapes.