Skip to main content

Integrate Databricks with Dotdigital

Connect your Databricks Lakehouse Platform to Dotdigital to sync contact, product, and order data directly into your account.

Written by Bartlomiej Rekosiewicz
Updated today

Databricks is a cloud-based analytics and data storage platform built on Apache Spark. It lets you combine data from multiple systems into a single source of truth. Connecting Databricks to Dotdigital allows you to sync contact, product, and order data without restructuring your tables or creating duplicates. This allows you to bring data from multiple systems into Dotdigital so you can target all of your customers with personalised emails, SMS, and WhatsApp messages.

This integration supports syncing contacts, products, and orders.

Order syncing requires a fixed schema.

Public preview

This feature is currently in public preview.

To join the preview, contact your Customer Success representative.


Before you start

You must have:

  • A Databricks workspace.

  • An SQL Warehouse (formerly SQL Endpoint).

  • A Databricks personal access token.

  • Permissions for the catalog, schema, and tables or views you want to sync.

  • A table or view containing your contacts and/or products.

  • A dedicated Orders table in the required schema.
    See Order data structure requirements below.

  • A timestamp column of type TIMESTAMP for incremental syncing.

  • Confirmation from your Databricks admin on whether Unity Catalog is required.

Service Principal requirements

If you authenticate using OAuth 2.0, you must create a Service Principal in Databricks.

Your Service Principal must have:

  • Workspace user permission.

  • CAN USE permission on the SQL Warehouse.

  • View-level access on each table, view, or schema you want to sync.

Databricks integration syncs data into Dotdigital.

To sync data out of Dotdigital, use the Firehose integration.

You can use both integrations together if you need two‑way data movement.


What you can do

  • Import contact data from Databricks to build targeted segments and campaigns.

  • Import product data to support dynamic content, recommendations, and personalised offers.

  • Import order data to bring purchase history into Dotdigital, supporting RFM analysis, ecommerce segmentation, and personalised automations.

  • Connect to Databricks views to sync consolidated datasets that combine multiple sources.


1. Integrate Databricks with Dotdigital

Start the integration by finding Databricks in the self‑serve integrations menu.

  1. Go to Connect > Integrations.

  2. In the left menu, under SHOW, select Self‑serve.

  3. Find Databricks, then select +ADD.


2. Connect your account

Set up the connection between Dotdigital and Databricks.

  1. Read the integration information, then select NEXT.

  2. For Connect to Databricks, select Connect account.

  3. Enter the following details:

    • Databricks instance (workspace URL)

    • Client secret

    • Client ID

  4. Select CREATE.

  5. Select NEXT.


3. Set up your datasets

You can configure Contacts, Products, and Orders independently, but the setup flow is linear.

You move through each dataset in order, starting with Contacts.

At each stage, you must choose Yes or No:

  • If you select Yes, the wizard guides you through the setup steps for that dataset.

  • If you select No, the wizard skips the detailed steps and moves on to the next dataset.

  • You cannot jump directly to Products or Orders without first answering the Contacts step.

Loading data source options may take up to a few minutes. This is expected, as the system retrieves the available options from Databricks.


Contacts setup

1. Sync contacts

Choose whether you want to sync contact data from Databricks into Dotdigital.

If you already get your contacts from another source, such as a CRM, select No.

  1. Select Yes or No.

  2. Select NEXT.

2. Select data source for Contacts

Select where your contact data lives in Databricks.

  1. Select your Warehouse.

  2. Select your Database.

  3. Select your Table or View.

  4. Select NEXT.

3. Select timestamp for Contacts

Choose the TIMESTAMP column used for incremental sync.

  • Must be TIMESTAMP type.

  • Rows without timestamps cannot sync.

  • Only rows newer than last successful sync are imported.

  1. Select timestamp column.

  2. Select NEXT.

4. Map data fields

Map Databricks columns to Dotdigital contact fields.

  • Contacts require email or mobile number

  • Optional fields can be mapped

  • Custom fields cannot be added

  • Orders do not use mapping

  1. Select a Dotdigital field.

  2. Select matching Databricks column.

  3. Repeat as needed.

  4. Select NEXT.

5. Set sync frequency

  1. Select the frequency you want to sync the contacts.

  2. Select NEXT.


Products setup

1. Sync product catalogue

Choose whether to sync product data into Dotdigital.

  1. Select Yes or No.

  2. Select NEXT.

2. Select data source for Products

Select where your contact data lives in Databricks.

  1. Select your Warehouse.

  2. Select your Database.

  3. Select your Table or View.

  4. Select NEXT.

3. Select timestamp for Products

Choose the column that tracks when each record was last updated.

  • The column must be type TIMESTAMP, not STRING or DATETIME.

  • Rows without timestamps cannot be synced.

  • Only rows with timestamps newer than the last successful sync are imported.

  1. Select your timestamp column.

  2. Select NEXT.

4. Map data fields

Some product fields are greyed out because they’re mandatory. You must map these fields for the product sync to work, and you can’t remove or change them. You cannot skip this step.

  1. Select a Dotdigital field.

  2. Select the Databricks product fields for the matching Dotdigital insight fields.

  3. Add additional mappings as required.

  4. Select NEXT.

5. Set sync frequency

Choose how often your data syncs. Dotdigital checks for updates on your chosen schedule, but only imports rows with timestamps newer than the last successful sync.

  1. Select your sync frequency.

  2. Select NEXT.

To change the schedule later, edit your integration settings.

To pause syncing, disable the integration and re‑enable it when you are ready.


Orders setup

1. Sync orders

Order syncing brings purchase history into Dotdigital.

Order data must follow a fixed schema containing:

  • Order‑level fields (ID, totals, currency, purchase date)

  • Customer identifier fields

  • Billing and delivery details

  • A nested items field for order lines

The integration does not work if your table does not follow the required structure.

  1. Select Yes or No.

  2. Select NEXT.

You cannot manually map order fields, the integration requires an exact schema because Dotdigital parses the structure directly.

2. Select data source for Orders

Select where your contact data lives in Databricks.

  1. Select Warehouse.

  2. Select Database.

  3. Select View/Table.

  4. Select NEXT.

3. Set sync frequency

Choose how often your data syncs. Dotdigital checks for updates on your chosen schedule, but only imports rows with timestamps newer than the last successful sync.

  1. Select your sync frequency.

  2. Select FINISH.

To change the schedule later, edit your integration settings.

To pause syncing, disable the integration and re‑enable it when you are ready.

The Orders sync uses a fixed schema, so Dotdigital reads the structure of your table directly instead of asking you to map fields manually. This ensures every order record is imported consistently, and avoids issues that can occur when custom field names or formats don’t match expected ecommerce standards. If your table doesn’t follow the required structure, update it before starting the sync.

Order data structure requirements

Databricks order syncing requires your table to follow a defined schema.

Your table must include:

  • Order‑level fields

  • Customer identifier fields

  • Billing and delivery address fields

  • A PRODUCTS or line‑items field with ARRAY, STRUCT, or JSON

If your data does not follow this structure, create a new table that matches the required schema.

CREATE OR REPLACE TABLE `rainbow`.`IsOrderTable` (

CREATED TIMESTAMP NOT NULL,

UPDATED TIMESTAMP NOT NULL,

ID STRING,

ORDER_TOTAL DECIMAL(18, 2) NOT NULL,

PAYMENT STRING,

DELIVERY_METHOD STRING,

DELIVERY_TOTAL DECIMAL(18, 2),

CURRENCY STRING NOT NULL,

ORDER_STATUS STRING,

EMAIL STRING NOT NULL,

QUOTE_ID STRING,

PURCHASE_DATE DATE NOT NULL,

BILLING_ADDRESS_1 STRING,

BILLING_ADDRESS_2 STRING,

BILLING_CITY STRING,

BILLING_COUNTRY STRING,

BILLING_POSTCODE STRING,

DELIVERY_ADDRESS_1 STRING,

DELIVERY_ADDRESS_2 STRING,

DELIVERY_CITY STRING,

DELIVERY_COUNTRY STRING,

DELIVERY_POSTCODE STRING,

-- Match your JSON structure directly:

PRODUCTS ARRAY<STRUCT<

NAME STRING,

PRICE DECIMAL(18,2),

SKU STRING,

QTY INT

>>,

ORDER_SUBTOTAL DECIMAL(18, 2) NOT NULL,

BASE_SUBTOTAL_INCL_TAX DECIMAL(18, 2),

DISCOUNT_AMOUNT DECIMAL(18, 2),

COUPONCODE STRING

)

USING DELTA

COMMENT 'Orders table created';

Did this answer your question?