Databricks Connector

The Databricks component allows you to list, import, and automate Databricks jobs.

Prerequisites

Version 9.2.9 or later
Component Connection Management version 1.0.0.3
Privileges Required to use Connections
Privileges Required to use Databricks

Contents of the Component

Object Type	Name	Description
Application	GLOBAL.Redwood.REDWOOD.Databricks	Integration connector with the Databricks system
ConstraintDefinition	REDWOOD.Redwood_DatabricksConnectionConstraint	Constraint for Databricks Connection fields
ExtensionPoint	REDWOOD.Redwood_DatabricksConnection	Databricks Connector
Process Definition	REDWOOD.Redwood_Databricks_ImportJob	Import a job from Databricks
Process Definition	REDWOOD.Redwood_Databricks_RunJob	Run a job in Databricks
Process Definition	REDWOOD.Redwood_Databricks_RunJob_Template	Template definition to run a job in Databricks
Process Definition	REDWOOD.Redwood_Databricks_ShowJobs	List all existing jobs in Databricks
Process Definition Type	REDWOOD.Redwood_Databricks	Databricks Connector
Library	REDWOOD.Redwood_Databricks	Library for Databricks connector

Process Definitions

Redwood_Databricks_ImportJob

Import a job from Databricks. Imports one or more Databricks jobs as RunMyJobs Process Definitions. Specify a Name Filter to control what processes are imported, and Generation Settings to control the attributes of the imported definitions.

Parameters

Tab	Name	Description	Documentation	Data Type	Direction	Default Expression	Values
Parameters	connection	Connection	The Connection object that defines the connection to the Databricks application.	String	In
Parameters	filter	Job Name Filter	This filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards `*` and `?` are allowed.	String	In
Parameters	overwrite	Overwrite Existing Definition	When set to Yes, if a definition already exists with the same name as the name generated for the imported object, it will be overwritten with the new import. When set to No, the import for that template will be skipped if a definition with the same name already exists.	String	In	`N`	Y,N
Generation Settings	targetPartition	Partition	The Partition to create the new definitions in.	String	In
Generation Settings	targetApplication	Application	The Application to create the new definitions in.	String	In
Generation Settings	targetQueue	Default Queue	The default Queue to assign to the generated definitions.	String	In
Generation Settings	targetPrefix	Definition Name Prefix	The prefix to add onto the name of the imported Databricks Job to create the definition name.	String	In	`CUS_DBCKS_`

Redwood_Databricks_RunJob

Runs a Databricks job and monitors it until completion. The RunMyJobs Process will remain in a Running state until the Databricks job completes. If the Databricks Job succeeds, the RunMyJobs process will complete successfully. If the Databricks Job fails, the RunMyJobs process will complete in Error, and any available error information is written to the stdout.log file. Parameters are available on the definition to pass input parameters for the different types of Databricks tasks. For example, adding a value to the Python Parameters parameter will make that parameter available to all Python tasks in the Databricks Job. If the job does not require parameters for a certain task type, leave that parameter empty. See the parameters table below for more information.

Parameters

Name	Description	Documentation	Data Type	Direction	Values
connection	Connection	The Connection object that defines the connection to the Databricks application.	String	In
jobId	Job Id to run	This is the Job Id in Databricks to execute	String	In
sparkJarParameters	Spark Jar Parameters	An array of Spark Jar Parameters to be used on the Databricks Job	String	In
sparkSubmitParameters	Spark Submit Parameters	An array of Spark Submit Parameters to be used on the Databricks Job	String	In
notebookParameters	Notebook Parameters	An array key=value pairs of Notebook Parameters to be used on the Databricks Job	String	In
pythonParameters	Python Parameters	An array of Python Parameters to be used on the Databricks Job	String	In
pythonNamedParameters	Python Named Parameters	An array key=value pairs of Python Named Parameters to be used on the Databricks Job	String	In
sqlParameters	SQL Parameters	An array key=value pairs of SQL Parameters to be used on the Databricks Job	String	In
dbtParameters	DBT Parameters	An array of DBT Parameters to be used on the Databricks Job	String	In
pipelineFullRefresh	Pipeline Full Refresh	Should a full refresh be performed on the Databricks Pipeline Job	String	In	`Y`=Yes, `N`=No
runId	Databricks Run Id	The Run Id of the executed Job on the Databricks side	String	Out

Redwood_Databricks_RunJob_Template

AThis template definition is provided to facilitate creating definitions that run specificDatabricksjobs. It's functionality and parameters are the same as theRedwood_Databricks_RunJobdefinition. To create a definition, Choose New (from template) from the context-menu of Redwood_Databricks_RunJob_Template.

note

To provide a default value for the Connection in the Connection parameter of the template, you must use the full Business Key of the Connection: EXTConnection:<Partition>.<ConnectionName>. Example: EXTConnection:GLOBAL.MyDatabricksConnection

Parameters

Name	Description	Documentation	Data Type	Direction	Values
connection	Connection	The Connection object that defines the connection to the Databricks application.	String	In
jobId	Job Id to run	This is the Job Id in Databricks to execute	String	In
sparkJarParameters	Spark Jar Parameters	An array of Spark Jar Parameters to be used on the Databricks Job	String	In
sparkSubmitParameters	Spark Submit Parameters	An array of Spark Submit Parameters to be used on the Databricks Job	String	In
notebookParameters	Notebook Parameters	An array key=value pairs of Notebook Parameters to be used on the Databricks Job	String	In
pythonParameters	Python Parameters	An array of Python Parameters to be used on the Databricks Job	String	In
pythonNamedParameters	Python Named Parameters	An array key=value pairs of Python Named Parameters to be used on the Databricks Job	String	In
sqlParameters	SQL Parameters	An array key=value pairs of SQL Parameters to be used on the Databricks Job	String	In
dbtParameters	DBT Parameters	An array of DBT Parameters to be used on the Databricks Job	String	In
pipelineFullRefresh	Pipeline Full Refresh	Should a full refresh be performed on the Databricks Pipeline Job	String	In	`Y`=Yes, `N`=No
runId	Databricks Run Id	The Run Id of the executed Job on the Databricks side	String	Out

Redwood_Databricks_ShowJobs

List all existing jobs in Databricks. Fetches information about the available Databricks Jobs. Job properties for returned jobs are written to the stdout.log file, the file named listing.rtx, as well as the Out parameter Job Listing.

Parameters

Name	Description	Documentation	Data Type	Direction
connection	Connection	The Connection object that defines the connection to the Databricks application.	String	In
filter	Job Name Filter	This filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards `*` and `?` are allowed.	String	In
listing	Job listing	The listing of all jobs available that match the input filter (or any if no input filter was provided)	Table	Out

Procedure

Create a Connection To Databricks

Navigate to Custom > Connections and choose .
Choose Databricks Connection under Select a Connection Type.
Choose Next or Basic Properties, this is a screen for all components, you create a queue and process server for your Databricks connection, all required settings will be set automatically.
Choose Next or Security, this is a screen for all components, choose to specify which roles can access the connection information.
Choose Next Datbricks Connection Properties, this is specific for Infmatica-based components, for Databricks, you choose between Basic and Personal Access Token authentication:
1. For Basic authentication, you specify URL, Username, and your Password.
2. For API Key authentication, you specify URL, Username, and your Access Token.
Navigate to Environment > Process Server, locate your Databricks process server and start it, ensure it reaches status Running.

Listing Databricks Jobs

Navigate to Definitions > Processes.
Choose Submit from the context-menu of Redwood_Databricks_ShowJobs.
Select the connection in the Connection field, specify an optional name filter in the Job Name Filter parameter, and choose Submit to list all available jobs.

Locating Connection Settings

Navigate to Custom > Connections.
The first line is used for filtering connections; you filter on Connection Type, Name, or Description by simply starting to type.

Product Documentation

Prerequisites

Contents of the Component

Process Definitions

Redwood_Databricks_ImportJob

Parameters

Redwood_Databricks_RunJob

Parameters

Redwood_Databricks_RunJob_Template

note

Parameters

Redwood_Databricks_ShowJobs

Parameters

Procedure

Create a Connection To Databricks

Listing Databricks Jobs

Locating Connection Settings

See Also