How would you parameterize Dagster pipelines to run same solids with multiple different configurations/assets?

Question

Let's say I create a Dagster pipeline with the following solids:

Execute SQL query from file and get results
Write results to a table

I want to do this for say 10 different tables in parallel. Each table requiring a different SQL query. What would be the best approach?

Catherine Wu · Accepted Answer · 2020-06-06T01:39:26.913

One approach is to use a solid factory. run_query_solid_factory and write_results_solid_factory are solid factories that take inputs (such as name and the query or table) and return a solid that can run in the pipeline. summary_report waits for all the upstream solids to complete before printing out summary info.

def run_query_solid_factory(name, query):
    @solid(name=name)
    def _run_query(context):
        context.log.info(query)
        return 'result'

    return _run_query

def write_results_solid_factory(name, table):
    @solid(name=name)
    def _write_results(context, query_result):
        context.log.info(table)
        context.log.info(query_result)
        return 'success'

    return _write_results

@solid
def summary_report(context, statuses):
    context.log.info(' '.join(statuses))

@pipeline
def pipeline():
    solid_output_handles = []
    queries = [('table', 'query'), ('table2', 'query2')]
    for table, query in queries:
        get_data = run_query_solid_factory('run_query_{}'.format(query), query)
        write_results = write_results_solid_factory('write_results_to_table_{}'.format(table), table)
        solid_output_handles.append(write_results(get_data()))

    summary_report(solid_output_handles)

Previous Answer:

I would recommend creating a composite_solid that consists of a solid that handles (1) and a solid that handles (2). Then, you can alias the composite_solid once for each of the 10 tables, which will let you pass in the SQL query via config (see tutorial)

Do you have any *answer code*? please let the others benefit from your answer. — dboy, May 02 '20 at 05:36
@dboy you can find it in the official docs https://docs.dagster.io/docs/tutorial/advanced#reusable-solids — Binh Pham, Jun 02 '20 at 17:30

How would you parameterize Dagster pipelines to run same solids with multiple different configurations/assets?

1 Answers1