Ask AI

You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.11 (core) / 0.22.11 (libraries)#

Bugfixes#

  • Fixed an issue where dagster dev or the Dagster UI would display an error when loading jobs created with op or asset selections.

1.6.10 (core) / 0.22.10 (libraries)#

New#

  • Latency improvements to the scheduler when running many simultaneous schedules.

Bugfixes#

  • The performance of loading the Definitions snapshot from a code server when large @multi_asset s are in use has been drastically improved.
  • The snowflake quickstart example project now renames the “by” column to avoid reserved snowflake names. Thanks @jcampbell!
  • The existing group name (if any) for an asset is now retained if the_asset.with_attributes is called without providing a group name. Previously, the existing group name was erroneously dropped. Thanks @ion-elgreco!
  • [dagster-dbt] Fixed an issue where Dagster events could not be streamed from dbt source freshness.
  • [dagster university] Removed redundant use of MetadataValue in Essentials course. Thanks @stianthaulow!
  • [ui] Increased the max number of plots on the asset plots page to 100.

Breaking Changes#

  • The tag_keys argument on DagsterInstance.get_run_tagsis no longer optional. This has been done to remove an easy way of accidentally executing an extremely expensive database operation.

Dagster Cloud#

  • The maximum number of concurrent runs across all branch deployments is now configurable. This setting can now be set via GraphQL or the CLI.
  • [ui] In Insights, fixed display of table rows with zero change in value from the previous time period.
  • [ui] Added deployment-level Insights.
  • [ui] Fixed an issue causing void invoices to show up as “overdue” on the billing page.
  • [experimental] Branch deployments can now indicate the new and modified assets in the branch deployment as compared to the main deployment. To enable this feature, turn on the “Enable experimental branch deployment asset graph diffing” user setting.

1.6.9 (core) / 0.22.9 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

1.0.8 (core) / 0.16.8 (libraries)#

New#

  • With the new cron_schedule argument to TimeWindowPartitionsDefinition, you can now supply arbitrary cron expressions to define time window-based partition sets.
  • Graph-backed assets can now be subsetted for execution via AssetsDefinition.from_graph(my_graph, can_subset=True).
  • RunsFilter is now exported in the public API.
  • [dagster-k8s] The dagster-user-deployments.deployments[].schedulerName Helm value for specifying custom Kubernetes schedulers will now also apply to run and step workers launched for the given user deployment. Previously it would only apply to the grpc server.

Bugfixes#

  • In some situations, default asset config was ignored when a subset of assets were selected for execution. This has been fixed.
  • Added a pin to grpcio in dagster to address an issue with the recent 0.48.1 grpcio release that was sometimes causing Dagster code servers to hang.
  • Fixed an issue where the “Latest run” column on the Instance Status page sometimes displayed an older run instead of the most recent run.

Community Contributions#

  • In addition to a single cron string, cron_schedule now also accepts a sequence of cron strings. If a sequence is provided, the schedule will run for the union of all execution times for the provided cron strings, e.g., ['45 23 * * 6', '30 9 * * 0] for a schedule that runs at 11:45 PM every Saturday and 9:30 AM every Sunday. Thanks @erinov1!
  • Added an optional boolean config install_default_libraries to databricks_pyspark_step_launcher . It allows to run Databricks jobs without installing the default Dagster libraries .Thanks @nvinhphuc!

Experimental#

  • [dagster-k8s] Added additional configuration fields (container_config, pod_template_spec_metadata, pod_spec_config, job_metadata, and job_spec_config) to the experimental k8s_job_op that can be used to add additional configuration to the Kubernetes pod that is launched within the op.

1.0.7 (core) / 0.16.7 (libraries)#

New#

  • Several updates to the Dagit run timeline view: your time window preference will now be preserved locally, there is a clearer “Now” label to delineate the current time, and upcoming scheduled ticks will no longer be batched with existing runs.
  • [dagster-k8s] ingress.labels is now available in the Helm chart. Any provided labels are appended to the default labels on each object (helm.sh/chart, app.kubernetes.io/version, and app.kubernetes.io/managed-by).
  • [dagster-dbt] Added support for two types of dbt nodes: metrics, and ephemeral models.
  • When constructing a GraphDefinition manually, InputMapping and OutputMapping objects should be directly constructed.

Bugfixes#

  • [dagster-snowflake] Pandas is no longer imported when dagster_snowflake is imported. Instead, it’s only imported when using functionality inside dagster-snowflake that depends on pandas.
  • Recent changes to run_status_sensors caused sensors that only monitored jobs in external repositories to also monitor all jobs in the current repository. This has been fixed.
  • Fixed an issue where "unhashable type" errors could be spawned from sensor executions.
  • [dagit] Clicking between assets in different repositories from asset groups and asset jobs now works as expected.
  • [dagit] The DAG rendering of composite ops with more than one input/output mapping has been fixed.
  • [dagit] Selecting a source asset in Dagit no longer produces a GraphQL error
  • [dagit] Viewing “Related Assets” for an asset run now shows the full set of assets included in the run, regardless of whether they were materialized successfully.
  • [dagit] The Asset Lineage view has been simplified and lets you know if the view is being clipped and more distant upstream/downstream assets exist.
  • Fixed erroneous experimental warnings being thrown when using with_resources alongside source assets.

Breaking Changes#

  • [dagit] The launchpad tab is no longer shown for Asset jobs. Asset jobs can be launched via the “Materialize All” button shown on the Overview tab. To provide optional configuration, hold shift when clicking “Materialize”.
  • The arguments to InputMapping and OutputMapping APIs have changed.

Community Contributions#

  • The ssh_resource can now accept configuration from environment variables. Thanks @cbini!
  • Spelling corrections in migrations.md. Thanks @gogi2811!

1.0.6 (core) / 0.16.6 (libraries)#

New#

  • [dagit] nbconvert is now installed as an extra in Dagit.
  • Multiple assets can be monitored for materialization using the multi_asset_sensor (experimental).
  • Run status sensors can now monitor jobs in external repositories.
  • The config argument of define_asset_job now works if the job contains partitioned assets.
  • When configuring sqlite-based storages in dagster.yaml, you can now point to environment variables.
  • When emitting RunRequests from sensors, you can now optionally supply an asset_selection argument, which accepts a list of AssetKeys to materialize from the larger job.
  • [dagster-dbt] load_assets_from_dbt_project and load_assets_from_dbt_manifest now support the exclude parameter, allowing you to more precisely which resources to load from your dbt project (thanks @flvndh!)
  • [dagster-k8s] schedulerName is now available for all deployments in the Helm chart for users who use a custom Kubernetes scheduler

Bugfixes#

  • Previously, types for multi-assets would display incorrectly in Dagit when specified. This has been fixed.
  • In some circumstances, viewing nested asset paths in Dagit could lead to unexpected empty states. This was due to incorrect slicing of the asset list, and has been fixed.
  • Fixed an issue in Dagit where the dialog used to wipe materializations displayed broken text for assets with long paths.
  • [dagit] Fixed the Job page to change the latest run tag and the related assets to bucket repository-specific jobs. Previously, runs from jobs with the same name in different repositories would be intermingled.
  • Previously, if you launched a backfill for a subset of a multi-asset (e.g. dbt assets), all assets would be executed on each run, instead of just the selected ones. This has been fixed.
  • [dagster-dbt] Previously, if you configured a select parameter on your dbt_cli_resource , this would not get passed into the corresponding invocations of certain context.resources.dbt.x() commands. This has been fixed.

1.0.4 (core) / 0.16.4 (libraries)#

New#

  • Assets can now be materialized to storage conditionally by setting output_required=False. If this is set and no result is yielded from the asset, Dagster will not create an asset materialization event, the I/O manager will not be invoked, downstream assets will not be materialized, and asset sensors monitoring the asset will not trigger.
  • JobDefinition.run_request_for_partition can now be used inside sensors that target multiple jobs (Thanks Metin Senturk!)
  • The environment variable DAGSTER_GRPC_TIMEOUT_SECONDS now allows for overriding the default timeout for communications between host processes like dagit and the daemon and user code servers.
  • Import time for the dagster module has been reduced, by approximately 50% in initial measurements.
  • AssetIn now accepts a dagster_type argument, for specifying runtime checks on asset input values.
  • [dagit] The column names on the Activity tab of the asset details page no longer reference the legacy term “Pipeline”.
  • [dagster-snowflake] The execute_query method of the snowflake resource now accepts a use_pandas_result argument, which fetches the result of the query as a Pandas dataframe. (Thanks @swotai!)
  • [dagster-shell] Made the execute and execute_script_file utilities in dagster_shell part of the public API (Thanks Fahad Khan!)
  • [dagster-dbt] load_assets_from_dbt_project and load_assets_from_dbt_manifest now support the exclude parameter. (Thanks @flvndh!)

Bugfixes#

  • [dagit] Removed the x-frame-options response header from Dagit, allowing the Dagit UI to be rendered in an iframe.
  • [fully-featured project example] Fixed the duckdb IO manager so the comment_stories step can load data successfully.
  • [dagster-dbt] Previously, if a select parameter was configured on the dbt_cli_resource, it would not be passed into invocations of context.resources.dbt.run() (and other similar commands). This has been fixed.
  • [dagster-ge] An incompatibility between dagster_ge_validation_factory and dagster 1.0 has been fixed.
  • [dagstermill] Previously, updated arguments and properties to DagstermillExecutionContext were not exposed. This has since been fixed.

Documentation#

  • The integrations page on the docs site now has a section for links to community-hosted integrations. The first linked integration is @silentsokolov’s Vault integration.

1.0.3 (core) / 0.16.3 (libraries)#

New#

  • Failure now has an allow_retries argument, allowing a means to manually bypass retry policies.
  • dagstermill.get_context and dagstermill.DagstermillExecutionContext have been updated to reflect stable dagster-1.0 APIs. pipeline/solid referencing arguments / properties will be removed in the next major version bump of dagstermill.
  • TimeWindowPartitionsDefinition now exposes a get_cron_schedule method.

Bugfixes#

  • In some situations where an asset was materialized and that asset that depended on a partitioned asset, and that upstream partitioned asset wasn’t part of the run, the partition-related methods of InputContext returned incorrect values or failed erroneously. This was fixed.
  • Schedules and sensors with the same names but in different repositories no longer affect each others idempotence checks.
  • In some circumstances, reloading a repository in Dagit could lead to an error that would crash the page. This has been fixed.

Community Contributions#

  • @will-holley added an optional key argument to GCSFileManager methods to set the GCS blob key, thank you!
  • Fix for sensors in fully featured example, thanks @pwachira!

Documentation#