Skip to main content

Azure Databricks Dev Environment Setup & Job Migration Guide

Document Version: 1.0
Last Updated: 02-02-2026
Maintained By: Fauzaan


This guide walks you through setting up a complete Development environment in Azure Databricks, securing it with Unity Catalog, and programmatically migrating jobs from Production.

Phase 1: Azure Infrastructure Setup

1. Create a Dev Storage Account

We need a dedicated storage account to act as the data lake for your development environment.

  1. Log in to the Azure Portal.
  2. Search for Storage accounts and click + Create.
  3. Basics Tab:
    • Subscription: Select your subscription.
    • Resource Group: Select an existing RG or create a new one (e.g., rg-databricks-dev).
    • Storage account name: Give it a unique name (e.g., stdatalakedev01).
    • Region: Same region as your Databricks workspace.
    • Performance: Standard.
    • Redundancy: LRS (Locally-redundant storage) is usually sufficient for Dev.
  4. Advanced Tab:
    • Hierarchical namespace: Enable this (Critical for ADLS Gen2).
  5. Click Review + create -> Create.

2. Create an Access Connector for Databricks

This resource allows Databricks to access your storage account securely using a Managed Identity.

  1. In Azure Portal, search for Access Connector for Azure Databricks.
  2. Click + Create.
  3. Basics:
    • Resource Group: Same as above.
    • Name: e.g., connector-databricks-dev.
    • Region: Same region.
  4. Click Review + create -> Create.
  5. Important: Once created, go to the resource, click Overview, and copy the Resource ID of the Managed Identity (it looks like /subscriptions/.../resourceGroups/.../providers/Microsoft.ManagedIdentity/...).

3. Grant IAM Roles (Storage Access)

Now, give the Access Connector permission to touch the data in the storage account.

  1. Go back to your Dev Storage Account (stdatalakedev01).
  2. Click Access Control (IAM) on the left sidebar.
  3. Click + Add -> Add role assignment.
  4. Role: Search for Storage Blob Data Contributor and select it.
  5. Members:
    • Select Managed Identity.
    • Click + Select members.
    • Subscription: Your subscription.
    • Managed Identity: Select Access Connector for Azure Databricks.
    • Select the connector you created in Step 2 (connector-databricks-dev).
  6. Click Select -> Review + assign.
note

Why this role? Storage Blob Data Contributor is required for Unity Catalog to read and write data. Standard "Contributor" access is not enough for data operations.


Phase 2: Databricks Workspace & Unity Catalog

1. Set Up New Databricks Premium Workspace

  1. In Azure Portal, search for Azure Databricks.
  2. Click + Create.
  3. Basics:
    • Workspace name: e.g., adb-dev-workspace.
    • Pricing Tier: Premium (Required for Unity Catalog).
    • Managed Resource Group: Leave default.
  4. Click Review + create -> Create.
  5. Once created, click Launch Workspace.

2. Connect to Metastore (Enable Unity Catalog)

Pre-requisite: You must be a Metastore Admin to do this.

  1. Go to your Account Console (click your email in the top right of the workspace -> Manage Account).
  2. Click Data on the sidebar.
  3. Click on your existing Metastore name.
  4. Go to the Workspaces tab inside the Metastore settings.
  5. Click Assign to workspace.
  6. Select your new Dev workspace (adb-dev-workspace) and click Assign.
    • This enables Unity Catalog for the new dev environment.

3. Create External Location

This tells Unity Catalog "Here is a valid place to store data."

  1. Open your Dev Workspace.
  2. Click Catalog in the left sidebar.
  3. Click + Add -> Add external location.
  4. Name: e.g., ext_loc_dev_storage.
  5. Storage URL: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/
    • Tip: You need to create a container (e.g., data) in your storage account first.
    • URL Example: abfss://data@stdatalakedev01.dfs.core.windows.net/
  6. Access Connector ID: Paste the Resource ID of the Access Connector you created in Phase 1, Step 2.
  7. Click Create.

4. Create the deal-dev Catalog

Now we create the logical container for your tables.

  1. In the Catalog Explorer.
  2. Click Create Catalog.
  3. Name: deal-dev.
  4. Storage location: Select the external location created above (ext_loc_dev_storage).
    • This ensures all managed tables in this catalog go to your new Dev storage account.
  5. Click Create.

Phase 3: Job Migration (Prod to Dev)

Migrating a job involves extracting the JSON configuration from Production, cleaning it, updating specific fields (like Cluster IDs), and pushing it to the Dev environment.

Prerequisites for Migration

  1. Cluster ID: Create a cluster in your Dev Workspace and copy its ID (e.g., 0202-100055-3utkplff).
  2. Access Token: Generate a PAT (Personal Access Token) in your Dev Workspace.
    • Go to Settings -> Developer -> Access tokens -> Manage -> Generate new token.

Step 1: Prepare the JSON Payload

Go to the job you want to migrate and select the three dots and select view as code. Pick JSON and select "Get". In the JSON only use the part inside the settings key. Extract it and change any settings such as cluster IDs. Run the code in a databricks notebook cell. When the JSON file is successfully created in databricks, run the code to create the job in another cell.

Example:

%py
import json

# Your cleaned JSON data
# NOTE: Ensure 'existing_cluster_id' matches your NEW DEV CLUSTER ID
job_data = {
"name": "RetentiaAI Monthly Prediction - DEV",
"email_notifications": {
"on_success": ["<email>"],
"on_failure": ["<email>"],
"no_alert_for_skipped_runs": False
},
"timeout_seconds": 0,
"schedule": {
"quartz_cron_expression": "24 0 11 1 * ?",
"timezone_id": "Asia/Karachi",
"pause_status": "PAUSED" # Good practice to keep PAUSED in Dev initially
},
"max_concurrent_runs": 1,
"tasks": [
{
"task_key": "feature_engineering",
"run_if": "ALL_SUCCESS",
"notebook_task": {
# Ensure these paths exist in your Dev Workspace!
"notebook_path": "/Workspace/Datascience/ML Models/reb-droppers-prediction/DREB Model - Feature Engineering",
"source": "WORKSPACE"
},
"existing_cluster_id": "<cluster_id>", # REPLACE WITH DEV CLUSTER ID
"timeout_seconds": 0
},
........ # Your other tasks depending on job.

],
"format": "MULTI_TASK"
}

# Save it to the local driver (temporary)
with open('/tmp/<pipeline_name>_job_config.json', 'w') as f:
json.dump(job_data, f)

print("JSON saved to /tmp/<pipeline_name>_job_config.json")

Step 2: Create the Job via API

This script reads the saved JSON and pushes it to the Dev Workspace using the API

%py 
import requests
import json

# Configuration for the NEW workspace
# Replace this URL with your actual Dev Workspace URL
TARGET_WORKSPACE_URL = "<dev_workspace_url>"

# Personal Access Token for the target (DEV) workspace
TARGET_TOKEN = "<your_pat>"

# Load the JSON we saved in the previous step
with open('/tmp/<pipeline_name>_job_config.json', 'r') as f:
payload = json.load(f)

# Send POST request to create the job
response = requests.post(
f"{TARGET_WORKSPACE_URL}/api/2.1/jobs/create",
headers={"Authorization": f"Bearer {TARGET_TOKEN}"},
json=payload
)

if response.status_code == 200:
print(f"Success! New Job ID: {response.json()['job_id']}")
print(f"View Job here: {TARGET_WORKSPACE_URL}/#job/{response.json()['job_id']}")
else:
print(f"Error {response.status_code}: {response.text}")
Checklist Before Running
  1. Notebook Paths: Ensure the notebooks listed in notebook_path have been imported into the Dev Workspace at the exact same location.

  2. Cluster ID: Double-check that existing_cluster_id matches a valid, running cluster in the Dev environment.

  3. Token: Ensure the TARGET_TOKEN has permissions to create jobs. :::