LogoLogo
  • Datafold
  • Introduction
    • Data Diff
      • Continuous Integration
      • Manual Data Diff
      • Diff Results
    • Column-level lineage
      • Usage, popularity, & impact per table or column
    • Alerting
  • ⏱️Quickstart Guide
  • Getting Started
    • Data Warehouses
      • Snowflake
      • BigQuery
      • Redshift
      • Postgres
      • Databricks
    • Configuration
      • Indexing
      • Filtering
      • Profiling
      • Lineage
    • On-prem Deployment
      • AWS
      • GCP
    • SSO
      • Okta
      • Google OAuth
      • SAML
  • Integrations
    • Continuous Integration
      • Source Control with Git
        • GitHub
          • On-prem Github
        • GitLab
      • dbt Cloud
      • dbt Core / datafold-sdk
        • GitHub example
        • GitLab example
      • dbt Configurations
      • datafold-sdk
    • Alert Integrations
      • Slack integration
        • Slack Alerts
        • On-prem Slack Integration
      • Alerting webhooks
    • Data Apps
      • Mode
      • Hightouch
  • Developer
    • Datafold API
      • Alerting
      • GraphQL Metadata API
      • Data Diff
      • Error handling
    • Security
      • GDPR
      • Network Security
Powered by GitBook
On this page

Was this helpful?

  1. Integrations
  2. Continuous Integration
  3. Source Control with Git

GitHub

This section describes how to set up automatic tests of pull requests (PR) on GitHub.

PreviousSource Control with GitNextOn-prem Github

Last updated 2 years ago

Was this helpful?

The GitHub integration with Datafold enables automation of ETL code regression testing with Data Diff: on every pull request to the ETL code, Datafold GitHub App will run a data diff to evaluate the impact of the change to the ETL source code on the data produced. The summary of the diff will be posted in the Pull Request discussion, and the detailed diff view can be then explored in the Datafold App:

The GitHub permissions requested by the Datafold App are:

"contents": "read"
"metadata": "read"
"statuses": "write"
"pull_requests": "write"

Installing the Datafold GitHub App

The easiest way to integrate with Datafold is to allow access to the repositories using the GitHub App.

You have to be an admin of your GitHub organization to be able to install the GitHub App.

Go to the Git settings screen in Datafold. Click Install GitHub App. Follow through the steps. Instead of all repositories, select the specific repositories the app should have access to.

When the process is complete, you are returned to the settings screen.

The Refresh button is available when an administrator makes changes to the repositories that the app has access to. When any administrative change has occurred on GitHub, that button can be used to synchronize Datafold with the remote changes.

If you select the repository and hit Save, you will return to the Git settings screen:

If you are on an on-prem deployment, you should first create the GitHub App. See . Then, proceed with the current tutorial.

Now you can set up the CI Integration. For more information, please refer to .

GitHub integration for Datafold on-prem
dbt integration
Example attachment of datadiff to GitHub Pull Request