Questions tagged [dagster]

Dagster is an open source system for building modern data applications.

Dagster, by Elementl, is a set of abstractions for building self-describing, testable, and reliable data applications. It uses functional data programming, gradual/optional typing, and testability to facilitate composition of data applications from DAGs of solids, its basic computational unit.

39 questions
4
votes
1 answer

How to use dictionary yielded from other solid in a composite Solid?

For example I have a solid named initiate_load , it is yielding a dictionary and an integer , something like : @solid( output_defs=[ OutputDefinition(name='l_dict', is_required=False), OutputDefinition(name='l_int',…
Atif
  • 682
  • 1
  • 5
  • 18
3
votes
1 answer

How would you parameterize Dagster pipelines to run same solids with multiple different configurations/assets?

Let's say I create a Dagster pipeline with the following solids: Execute SQL query from file and get results Write results to a table I want to do this for say 10 different tables in parallel. Each table requiring a different SQL query. What would…
Binh Pham
  • 33
  • 4
2
votes
1 answer

Testing a dagster pipeline

Summary: Dagster run configurations for Dagit vs. PyTest appear to be incompatible for my project I've been getting errors trying to run pytest on a pipeline and I'd really appreciate any pointers. I've consistently gotten errors of the…
jumbolaya
  • 23
  • 4
2
votes
1 answer

How to avoid running the rest of a dagster pipeline under certain conditions

say I have two solids in Dagster connected on a pipeline. The first solid may do some process and generate a valid input so that the rest of the pipeline executes, or generate an invalid input that should not be further processed. To achieve this…
ElBrocas
  • 159
  • 1
  • 9
2
votes
1 answer

NoneType Error when trying to make a custom BeautifulSoup Dagster Type

I've been messing around with @dagster_type and was trying to make a custom HtmlSoup type. Basically a fancy @dagster_type wrapper around a BeautifulSoup Object. import requests from bs4 import BeautifulSoup from dagster import ( dagster_type, …
JohnMav
  • 31
  • 4
2
votes
1 answer

Caching Dagster's pipeline results

Is there a way to cache the output of the solids in the pipeline in such a way that if I run the same pipeline but with a slightly different configuration (think hyper-parameter tuning), certain initial steps in the pipelines that are unaffected by…
moomima
  • 915
  • 8
  • 11
1
vote
1 answer

Docstrings for Dagster wrapped functions not appearing in Sphinx

I've got a project written in python with use of pypspark and now dagster. We are using Sphinx to build the documentation and napoleon to parse the Google style docstrings. We've started including prewrapped dagster solids like the…
1
vote
1 answer

Dagster loop over solid's output

I have a Dagster pipeline consisting of two solids (reproducible example below). The first (return_some_list) outputs a list of some objects. The second solid (print_num) accepts an element from the first list (not the full list) and does some…
cyau
  • 125
  • 6
1
vote
2 answers

Executing a solid when at least one of the required inputs is given

As an input, I would like to retrieve data based on user input, or "randomly" from a DB if no user input is given. All other downstream tasks of the pipeline would be the same. Therefore, I would like to create a pipeline starting with solids A and…
noeljbf
  • 13
  • 2
1
vote
1 answer

Dagster: how to reexecute failed steps of a pipeline?

I created a test pipeline and it fails mid-way. I want to programmatically re-execute it but starting at the failed step of the pipeline and move forward. I do not want to repeat execution of the earlier, successful steps. from dagster import…
sophros
  • 8,714
  • 5
  • 30
  • 57
1
vote
1 answer

What is the value of $repositoryLocationName when running ExecutePipeline in Dagster's GraphQL API?

I am attempting to launch a Dagster pipeline run with the GraphQL API. I have Dagit running locally and a working pipeline that I can trigger via the playground. However, I am now trying to trigger the pipeline via GraphQL Playground, available at…
Atticus
  • 57
  • 6
1
vote
2 answers

Dagster failure notification systems

Is there a way in dagster to receive notifications when certain events occur, such as failures? For example, is there an integration with a tool like sentry available?
ElBrocas
  • 159
  • 1
  • 9
1
vote
1 answer

Is the working directory of the dagster main process different of the scheduler processes

I'm having an issue with the loading of a file from dagster code (setup, not pipelines). Say I have the following project structure: pipelines -app/ --environments ----schedules.yaml --repository.py --repository.yaml When I run dagit while inside…
ElBrocas
  • 159
  • 1
  • 9
1
vote
1 answer

Error: 'dagster.core.types.runtime' is not a package

Installed dagster for the first time in a conda environment and tried to run the airline demo as described here. The following are the steps I followed. conda env create -n dagster python=3.7 conda activate dagster pip install dagster dagit git…
1
vote
2 answers

Core compute for solid returned an output multiple times

I am very new to Dagster and I can't find answer to my question in the docs. I have 2 solids: one thats yielding tuples(str, str) that are parsed from XML file, he other one just consumes tuples and stores objects in DB with according fields set.…
1
2 3