Getting Started with Databricks
    • 13 Nov 2024
    • 3 Minutes to read
    • Dark
      Light

    Getting Started with Databricks

    • Dark
      Light

    Article summary

    Acante's Secure Data Access platform is specifically tailored to simplify security and governance for the Databricks platform. To integrate Acante with your Databricks implementation, Acante has provided a set of Notebooks to create all the necessary roles, configurations and gather the necessary meta-data. Databricks documentation: https://docs.databricks.com/en/index.html

    Deployment Prerequisites

    • Databricks CLI should be installed (v0.222.0 or higher) with profile created for workspaces being onboarded (documentation)
    • Access to Databricks admin credentials
    • External volume where Acante can create a meta-data catalog and volume

    Sequence of Steps in Databricks

    1. Log in to the Acante UI, navigate to the Configurations --> Databricks section and Download the Notebooks
      Screenshot 2024-04-30 at 2.40.39 PM.png

    There are three notebooks in the folder:

    • Acante Provisioning notebook: this notebook creates the service principals and onboards the workspace creating the necessary resources (cluster, volume for metadata, etc.) and jobs to collect metadata.
    • Acante Metadata Discovery notebook: this notebook collects the necessary metadata (schemas, users, configurations, and so on). It does NOT read any of the actual data in your Databricks.
    • Acante Input Variables notebook: this notebook has the inputs for Acante Provisioning notebook
    1. Log in to the primary Databricks workspace you are onboarding as Account Admin
    2. Import the three notebooks:

    Import_notebook.png

    Browse and multi-select the downloaded notebooks & click Import
    Import notebooks_2.png

    1. Open the Acante Input Variables notebook
      Input variables-20241028.png

      • Add the Databricks Account ID as described in the Databricks documentation
      • Add Scope & Key for a Databricks secret in the current workspace with the Account Administrator (should be the current user) credentials. The recommended option is to use OAuth tokens (instead of the legacy Password option). Instructions to create the token & secret are as follows:
        • [FOR AWS] databricks auth login --host https://accounts.cloud.databricks.com --account-id <DATABRICKS_ACCOUNT_ID>
        • [FOR AZURE] databricks auth login --host https://accounts.azuredatabricks.net --account-id <DATABRICKS_ACCOUNT_ID>
        • Using databricks CLI version v0.222.0 or higher, run :
          databricks auth token --profile ACCOUNT-<DATABRICKS_ACCOUNT_ID>
        • Take output from the step above and create a Databricks Scope + Secret (note: the "--profile " below is required if the default databricks cli profile does not point to the primary workspace being onboarded)
          • databricks secrets --profile <profile-name> create-scope acante_scope
          • databricks secrets --profile <profile-name> put-secret acante_scope acante_admin_token --string-value <secret>
        • Note: By default, the generated token in expires after 1 hour, so the token has to be updated each time this provisioning script is run
      • Enable Audit Logging: This flags turns on the ACCESS schema in Unity Catalog and enables verbose audit logs on all onboarded workspaces to read audit logs. Provides the necessary historical observability for Acante to analyze data access patterns.
      • If you want to onboard additional Workspaces, enter the names as a comma separated list
    2. Open the Acante Provisioning notebook

    3. Run all to run this notebook. This will automatically setup all resources and permissions.

    Run_all.png

    1. Note the output of this cell "Workspace Onboarding Parameters. The json output will be used as a configuration in Acante.
    2. ONLY for Private Link: if you have restricted network access to the Databricks Control Plane via Private Link, then follow the steps here to provide access to Acante.

    Providing the Configuration Inputs in the Acante UI

    1. Log in to the Acante UI and navigate to the Configurations page.

    Configurations pages for Databricks.png

    1. Click on Select Account and Add New Account . Alternately, you can also modify configurations for a previously configured account
      Add new account configuration.png

    2. Find and enter the configurations from your Databricks console into the input box
      Configuration input box.png

    3. Add the Databricks Account Name from your Databricks console

    Account_name.png

    1. Next, add information for the Shared Acante Service Principal. In your Databricks console, go to User Management > Service Principals and click on the acante-principal-shared
      List of service principals.png

    Click on Generate Secret to get the required configuration inputs:
    Acante shared service principal secret.png

    • Client ID: this is ID for Shared Service Principal that the Acante Cloud can assume to pull the collected metadata
    • Secret Key: OAuth secret created
    1. Collect the Workspace Onboarding Parameters which is JSON output from Cell 6 . Copy paste the entire JSON at the top of the JSON editor box in the Configuration box.

    Add Workspace inputs.png

    1. Press Save

    2. Acante will check the connection for each of the workspaces and confirm that it could connect to each of them or if any of them had connection errors.
      Configuration confirmation.png

    3. The Acante Cloud immediately triggers a Metadata Discovery process to pull all the Catalogs and their schemas. You can start seeing the analysis results in the product tabs shortly.

    4. If the backend S3 buckets for Databricks are encrypted using Customer Managed Keys, please follow the step here