Ask AI

You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.11 (core) / 0.22.11 (libraries)#

Bugfixes#

  • Fixed an issue where dagster dev or the Dagster UI would display an error when loading jobs created with op or asset selections.

1.6.10 (core) / 0.22.10 (libraries)#

New#

  • Latency improvements to the scheduler when running many simultaneous schedules.

Bugfixes#

  • The performance of loading the Definitions snapshot from a code server when large @multi_asset s are in use has been drastically improved.
  • The snowflake quickstart example project now renames the “by” column to avoid reserved snowflake names. Thanks @jcampbell!
  • The existing group name (if any) for an asset is now retained if the_asset.with_attributes is called without providing a group name. Previously, the existing group name was erroneously dropped. Thanks @ion-elgreco!
  • [dagster-dbt] Fixed an issue where Dagster events could not be streamed from dbt source freshness.
  • [dagster university] Removed redundant use of MetadataValue in Essentials course. Thanks @stianthaulow!
  • [ui] Increased the max number of plots on the asset plots page to 100.

Breaking Changes#

  • The tag_keys argument on DagsterInstance.get_run_tagsis no longer optional. This has been done to remove an easy way of accidentally executing an extremely expensive database operation.

Dagster Cloud#

  • The maximum number of concurrent runs across all branch deployments is now configurable. This setting can now be set via GraphQL or the CLI.
  • [ui] In Insights, fixed display of table rows with zero change in value from the previous time period.
  • [ui] Added deployment-level Insights.
  • [ui] Fixed an issue causing void invoices to show up as “overdue” on the billing page.
  • [experimental] Branch deployments can now indicate the new and modified assets in the branch deployment as compared to the main deployment. To enable this feature, turn on the “Enable experimental branch deployment asset graph diffing” user setting.

1.6.9 (core) / 0.22.9 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.14.16#

New#

  • AssetsDefinition.from_graph now accepts a partitions_def argument.
  • @asset-decorated functions can now accept variable keyword arguments.
  • Jobs executed in ECS tasks now report the health status of the ECS task
  • The CLI command dagster instance info now prints the current schema migration state for the configured instance storage.
  • [dagster-dbt] You can now configure a docs_url on the dbt_cli_resource. If this value is set, AssetMaterializations associated with each dbt model will contain a link to the dbt docs for that model.
  • [dagster-dbt] You can now configure a dbt_cloud_host on the dbt_cloud_resource, in the case that your dbt cloud instance is under a custom domain.

Bugfixes#

  • Fixed a bug where InputContext.upstream_output was missing the asset_key when it referred to an asset outside the run.
  • When specifying a selection parameter in AssetGroup.build_job(), the generated job would include an incorrect set of assets in certain situations. This has been fixed.
  • Previously, a set of database operational exceptions were masked with a DagsterInstanceSchemaOutdated exception if the instance storage was not up to date with the latest schema. We no longer wrap these exceptions, allowing the underlying exceptions to bubble up.
  • [dagster-airbyte] Fixed issue where successfully completed Airbyte syncs would send a cancellation request on completion. While this did not impact the sync itself, if alerts were set up on that connection, they would get triggered regardless of if the sync was successful or not.
  • [dagster-azure] Fixed an issue where the Azure Data Lake Storage adls2_pickle_io_manager would sometimes fail to recursively delete a folder when cleaning up an output.
  • Previously, if two different jobs with the same name were provided to the same repo, and one was targeted by a sensor/schedule, the job provided by the sensor/schedule would silently overwrite the other job instead of failing. In this release, a warning is fired when this case is hit, which will turn into an error in 0.15.0.
  • Dagit will now display workspace errors after reloading all repositories.

Breaking Changes#

  • Calls to instance.get_event_records without an event type filter is now deprecated and will generate a warning. These calls will raise an exception starting in 0.15.0.

Community Contributions#

  • @multi_asset now supports partitioning. Thanks @aroig!
  • Orphaned process detection now works correctly across a broader set of platforms. Thanks @aroig!
  • [K8s] Added a new max_concurrent field to the k8s_job_executor that limits the number of concurrent Ops that will execute per run. Since this executor launches a Kubernetes Job per Op, this also limits the number of concurrent Kuberenetes Jobs. Note that this limit is per run, not global. Thanks @kervel!
  • [Helm] Added a new externalConfigmap field as an alternative to dagit.workspace.servers when running the user deployments chart in a separate release. This allows the workspace to be managed outside of the main Helm chart. Thanks @peay!
  • Removed the pin on markupsafe<=2.0.1. Thanks @bollwyvl!

0.14.15#

New#

  • Sensors / schedules can now return a list of RunRequest objects instead of yielding them.
  • Repositories can now contain asset definitions and source assets for the same asset key.
  • OpExecutionContext (provided as the context argument to Ops) now has fields for, run, job_def, job_name, op_def, and op_config. These replace pipeline_run, pipeline_def, etc. (though they are still available).
  • When a job is partitioned using an hourly, daily, weekly, or monthly partitions definition, OpExecutionContext now offers a partition_time_window attribute, which returns a tuple of datetime objects that mark the bounds of the partition’s time window.
  • AssetsDefinition.from_graph now accepts a partitions_def argument.
  • [dagster-k8s] Removed an unnecessary dagster-test-connection pod from the Dagster Helm chart.
  • [dagster-k8s] The k8s_job_executor now polls the event log on a ~1 second interval (previously 0.1). Performance testing showed that this reduced DB load while not significantly impacting run time.
  • [dagit] Removed package pins for Jinja2 and nbconvert.
  • [dagit] When viewing a list of Runs, tags with information about schedules, sensors, and backfills are now more visually prominent and are sorted to the front of the list.
  • [dagit] The log view on Run pages now includes a button to clear the filter input.
  • [dagit] When viewing a list of Runs, you can now hover over a tag to see a menu with an option to copy the tag, and in filtered Run views, an option to add the tag to the filter.
  • [dagit] Configuration editors throughout Dagit now display clear indentation guides, and our previous whitespace indicators have been removed.
  • [dagit] The Dagit Content-Security-Policy has been moved from a <meta> tag to a response header, and several more security and privacy related headers have been added as well.
  • [dagit] Assets with multi-component key paths are always shown as foo/bar in dagit, rather than appearing as foo > bar in some contexts.
  • [dagit] The Asset graph now includes a “Reload definitions” button which reloads your repositories.
  • [dagit] On all DAGs, you can hold shift on the keyboard to switch from mouse wheel / touch pad zooming to panning. This makes it much easier to scroll horizontally at high speed without click-drag-click-drag-click-drag.
  • [dagit] a --log-level flag is now available in the dagit cli for controlling the uvicorn log level.
  • [dagster-dbt] The load_assets_from_dbt_project() and load_assets_from_dbt_manifest() utilities now have a use_build_command parameter. If this flag is set, when materializing your dbt assets, Dagster will use the dbt build command instead of dbt run. Any tests run during this process will be represented with AssetObservation events attached to the relevant assets. For more information on dbt build, see the dbt docs.
  • [dagster-dbt] If a dbt project successfully runs some models and then fails, AssetMaterializations will now be generated for the successful models.
  • [dagster-snowflake] The new Snowflake IO manager, which you can create using build_snowflake_io_manager offers a way to store assets and op outputs in Snowflake. The PandasSnowflakeTypeHandler stores Pandas DataFrames in Snowflake.
  • [helm] dagit.logLevel has been added to values.yaml to access the newly added dagit --log-level cli option.

Bugfixes#

  • Fixed incorrect text in the error message that’s triggered when building a job and an asset can’t be found that corresponds to one of the asset dependencies.
  • An error is no longer raised when an op/job/graph/other definition has an empty docstring.
  • Fixed a bug where pipelines could not be executed if toposort<=1.6 was installed.
  • [dagit] Fixed an issue in global search where rendering and navigation broke when results included objects of different types but with identical names.
  • [dagit] server errors regarding websocket send after close no longer occur.
  • [dagit] Fixed an issue where software-defined assets could be rendered improperly when the dagster and dagit versions were out of sync.

Community Contributions#

  • [dagster-aws] PickledObjectS3IOManager now uses list_objects to check the access permission. Thanks @trevenrawr!

Breaking Changes#

  • [dagster-dbt] The asset definitions produced by the experimental load_assets_from_dbt_project and load_assets_from_dbt_manifest functions now include the schemas of the dbt models in their asset keys. To revert to the old behavior: dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"]).

Experimental#

  • The TableSchema API is no longer experimental.

Documentation#

  • Docs site now has a new design!
  • Concepts pages now have links to code snippets in our examples that use those concepts.

0.14.14#

New#

  • When viewing a config schema in the Dagit launchpad, default values are now shown. Hover over an underlined key in the schema view to see the default value for that key.
  • dagster, dagit, and all extension libraries (dagster-*) now contain py.typed files. This exposes them as typed libraries to static type checking tools like mypy. If your project is using mypy or another type checker, this may surface new type errors. For mypy, to restore the previous state and treat dagster or an extension library as untyped (i.e. ignore Dagster’s type annotations), add the following to your configuration file:
[mypy-dagster]  (or e.g. mypy-dagster-dbt)
follow_imports = "skip"
  • Op retries now surface the underlying exception in Dagit.
  • Made some internal changes to how we store schema migrations across our different storage implementations.
  • build_output_context now accepts an asset_key argument.
  • They key argument to the SourceAsset constructor now accepts values that are strings or sequences of strings and coerces them to AssetKeys.
  • You can now use the + operator to add two AssetGroups together, which forms an AssetGroup that contains a union of the assets in the operands.
  • AssetGroup.from_package_module, from_modules, from_package_name, and from_current_module now accept an extra_source_assets argument that includes a set of source assets into the group in addition to the source assets scraped from modules.
  • AssetsDefinition and AssetGroup now both expose a to_source_assets method that return SourceAsset versions of their assets, which can be used as source assets for downstream AssetGroups.
  • Repositories can now include multiple AssetGroups.
  • The new prefixed method on AssetGroup returns a new AssetGroup where a given prefix is prepended to the asset key of every asset in the group.
  • Dagster now has a BoolMetadataValue representing boolean-type metadata. Specifying True or False values in metadata will automatically be casted to the boolean type.
  • Tags on schedules can now be expressed as nested JSON dictionaries, instead of requiring that all tag values are strings.
  • If an exception is raised during an op, Dagster will now always run the failure hooks for that op. Before, certain system exceptions would prevent failure hooks from being run.
  • mapping_key can now be provided as an argument to build_op_context/build_solid_context. Doing so will allow the use of OpExecutionContext.get_mapping_key().

Bugfixes#

  • [dagit] Previously, when viewing a list of an asset’s materializations from a specified date/time, a banner would always indicate that it was a historical view. This banner is no longer shown when viewing the most recent materialization.
  • [dagit] Special cron strings like @daily were treated as invalid when converting to human-readable strings. These are now handled correctly.
  • The selection argument to AssetGroup.build_job now uses > instead of . for delimiting the components within asset keys, which is consistent with how selection works in Dagit.
  • [postgres] passwords and usernames are now correctly url quoted when forming a connection string. Previously spaces were replaced with +.
  • Fixed an issue where the celery_docker_executor would sometimes fail to execute with a JSON deserialization error when using Dagster resources that write to stdout.
  • [dagster-k8s] Fixed an issue where the Helm chart failed to work when the user code deployment subchart was used in a different namespace than the main dagster Helm chart, due to missing configmaps.
  • [dagster-airbyte] When a Dagster run is terminated while executing an Airbyte sync operation, the corresponding Airbyte sync will also be terminated.
  • [dagster-dbt] Log output from dbt cli commands will no longer have distracting color-formatting characters.
  • [dagit] Fixed issue where multi_assets would not show correct asset dependency information.
  • Fixed an issue with the sensor daemon, where the sensor would sometimes enter a race condition and overwrite the sensor status.

Community Contributions#

  • [dagster-graphql] The Python DagsterGraphQLClient now supports terminating in-progress runs using client.terminate_run(run_id). Thanks @Javier162380!

Experimental#

  • Added an experimental view of the Partitions page / Backfill page, gated behind a feature flag in Dagit.

0.14.13#

New#

  • [dagster-k8s] You can now specify resource requests and limits to the K8sRunLauncher when using the Dagster helm chart, that will apply to all runs. Before, you could only set resource configuration by tagging individual jobs. For example, you can set this config in your values.yaml file:
runLauncher:
  type: K8sRunLauncher
  config:
    k8sRunLauncher:
      resources:
        limits:
          cpu: 100m
          memory: 128Mi
        requests:
          cpu: 100m
          memory: 128Mi
  • [dagster-k8s] Specifying includeConfigInLaunchedRuns: true in a user code deployment will now launch runs using the same namespace and service account as the user code deployment.
  • The @asset decorator now accepts an op_tags argument, which allows e.g. providing k8s resource requirements on the op that computes the asset.
  • Added CLI output to dagster api grpc-health-check (previously it just returned via exit codes)
  • [dagster-aws] The emr_pyspark_step_launcher now supports dynamic orchestration, RetryPolicys defined on ops, and re-execution from failure. For failed steps, the stack trace of the root error will now be available in the event logs, as will logs generated with context.log.info.
  • Partition sets and can now return a nested dictionary in the tags_fn_for_partition function, instead of requiring that the dictionary have string keys and values.
  • [dagit] It is now possible to perform bulk re-execution of runs from the Runs page. Failed runs can be re-executed from failure.
  • [dagit] Table headers are now sticky on Runs and Assets lists.
  • [dagit] Keyboard shortcuts may now be disabled from User Settings. This allows users with certain keyboard layouts (e.g. QWERTZ) to inadvertently avoid triggering unwanted shortcuts.
  • [dagit] Dagit no longer continues making some queries in the background, improving performance when many browser tabs are open.
  • [dagit] On the asset graph, you can now filter for multi-component asset keys in the search bar and see the “kind” tags displayed on assets with a specified compute_kind.
  • [dagit] Repositories are now displayed in a stable order each time you launch Dagster.

Bugfixes#

  • [dagster-k8s] Fixed an issue where the Dagster helm chart sometimes failed to parse container images with numeric tags. Thanks @jrouly!
  • [dagster-aws] The EcsRunLauncher now registers new task definitions if the task’s execution role or task role changes.
  • Dagster now correctly includes setuptools as a runtime dependency.
  • In can now accept asset_partitions without crashing.
  • [dagit] Fixed a bug in the Launchpad, where default configuration failed to load.
  • [dagit] Global search now truncates the displayed list of results, which should improve rendering performance.
  • [dagit] When entering an invalid search filter on Runs, the user will now see an appropriate error message instead of a spinner and an alert about a GraphQL error.

Documentation#

  • Added documentation for partitioned assets
  • [dagster-aws] Fixed example code of a job using secretsmanager_resource.

0.14.12#

Bugfixes#

  • Fixed an issue where the Launchpad in Dagit sometimes incorrectly launched in an empty state.