datafold-sdk
The datafold-sdk enables our CI capabilities without using dbt
Last updated
Was this helpful?
The datafold-sdk enables our CI capabilities without using dbt
Last updated
Was this helpful?
Datafold allows you to trigger data diffs from CI using the . This allows you to easily integrate Datafold in your CI with arbitrary pipeline orchestrators. This often requires some glue code to wire, for example, Airflow to Datafold. First, you go to.
Next, click "Add new CI config":
Now, the connection has been set up, and for the pull request, you'll see the Datafold check on the pull request:
Now, you need to let Datafold know which tables we need to diff. First, we need to set the credentials using the environment variable DATAFOLD_APIKEY
. In the case of self-hosted Datafold, you need to set the DATAFOLD_HOST
as well:
From the CLI you can run:
This will request a single diff for the BEERS
table between the BEERS
and BEERS_DEV
schema. Of course, you can add multiple tables to the array.
This will trigger the diff, and the results will be posted in the Pull Request:
Also, for the Pythonistas, you can do it directly from Python:
First, we select the repository set up under . Then we choose the data source that , and we give it a name so we can easily remember the settings. The other settings are optional; more info can be found under the question-mark icon. After saving, we'll have a CI config id that we need later on: