Open protocol

Delta Sharing is an open standard for secure data sharing. Nexalis uses Delta Sharing to provide users with seamless access to shared datasets across different environments. This allows you to query data directly in Python, JAVA or Spark without needing complex integration steps, enabling seamless data sharing across organizations. For reference:

Databricks Delta Sharing

Credential File

When Nexalis shares a dataset with you, you will receive an email with a one-time download link. Clicking this link will take you to a Databricks portal where you can download a credential file. This credential file is essential for connecting data to various applications. Safely store this credential, as there won’t be an option to redownload it later. The link should direct you to:

Click on the blue box and the file named “config.share” should automatically start downloading. The file should contain:

{
	"shareCredentialsVersion": 1,
	"bearerToken": "<SHARING_TOKEN>",
	"endpoint": https://<DELTA_SHARING_ENDPOINT>,
    "expirationTime": "2024-06-18T22:36:37.792Z"
}

This credential file is essential for authentication and contains:

shareCredentialsVersion: protocol version used.
bearerToken: token that grants you access.
endpoint: the Delta Sharing endpoint URL.
expirationTime: when the credential becomes invalid.

Save this file securely (e.g., store in a secure path, not under version control). Nexalis will provide it for you, and it cannot be re-downloaded later. You will reference this file whenever you connect to Delta Sharing from your applications. Once expired, you’ll need to request a new file from Nexalis; there is no way to refresh it yourself.

Consuming the data

In this method, Nexalis provides data through the open Delta Sharing protocol, which is not limited to Python. You can access the same shared datasets from multiple programming languages and tools, such as R, Scala, Java, and BI platforms. The tutorial below focuses on Python and Spark because they are the most common for data analysis and pipelines, but you are free to use other environments. For additional examples, please refer to the official Delta Sharing documentation and tutorials:

Python Example

Nexalis provides you with a secure credential file (config.share) and the fully qualified table names you are allowed to access. You can read these shared tables on your machine in two ways:

Method A — Embedded Python: run from a normal Python process and create a SparkSession in-process. Best for ad-hoc exploration and small pulls.
Method B — spark-submit: run a standalone local Spark job with explicit packages and classpath. Best for larger data, repeatable jobs, and especially for real-time/“only new data” streaming using Spark Structured Streaming.

What Nexalis Provides

Nexalis will share with you:

The credential profile file: config.share (keep it safe and do not alter it).
One or more Delta Sharing table names in the format:

./config.share#<user-name>.<client-name>.<table>

Replace <user-name> with your assigned Nexalis username.
Replace <client-name> with your organization’s client name.
Replace <table> with the shared table name.

Prerequisites

A supported Java runtime (JDK 8/11/17). Make sure JAVA_HOME points to it.
For Method A: Python 3.8+ and pip.
For Method B: Local Spark 3.4.2 (see installation steps below).

Method A — Embedded Python (Spark created in-process)

When to use: quick exploration, notebooks, small to medium pulls, minimal setup. How it works: install Python libraries, keep config.share next to your script, start a SparkSession inside Python, and query the shared table.

Steps

Install the required libraries:

pip install delta-sharing pyspark pandas

Place config.share in a safe location (for example, next to your script).
Build the table URL in your script:

table_url = "./config.share#<user_name>.<client_name>.<table>"

Minimal example (batch read):

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("DeltaSharingLocal").getOrCreate()

df = (
    spark.read.format("deltasharing")
    .load(table_url)
    .where("tsConnector > 1742258615000 AND siteName = 'siteXYZ' AND dataPoint = 'ACTIVE_POWER'")
    .select("siteName", "dataPoint", "value", "unit", "tsConnector")
)

pdf = df.toPandas()
print(pdf.head())

Notes:

tsConnector is an epoch timestamp in milliseconds. Adjust filters to your time window.
This method produces a batch snapshot. For larger pulls or continuous updates, use Method B.

Method B — spark-submit (standalone Spark)

When to use: heavier data, repeatable jobs with full logs, or when you need real-time “only new data” ingestion. How it works: install Spark locally, then launch your script with spark-submit, including the Delta Sharing connector package.

One-time Spark Setup

wget https://archive.apache.org/dist/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
tar xzf spark-3.4.2-bin-hadoop3.tgz
export SPARK_HOME=$PWD/spark-3.4.2-bin-hadoop3
export PATH=$SPARK_HOME/bin:$PATH

Batch launch template (snapshot reads)

spark-submit   --packages io.delta:delta-sharing-spark_2.12:0.6.4   your_script.py   --delta_table_path './config.share#<user_name>.<client_name>.<table>'

Real-Time Refresh with Structured Streaming

For continuous ingestion of only new data, use Spark Structured Streaming. Unlike periodic batch jobs, a streaming job runs continuously, processing micro-batches (e.g., every 30 seconds). Spark automatically tracks progress in a checkpoint so that previously processed rows are not re-read.

Minimal streaming example

from pyspark.sql import SparkSession

table_url = "./config.share#<user_name>.<client_name>.<table>"

spark = (
    SparkSession.builder
    .appName("DeltaSharingStreaming")
    .getOrCreate()
)

# Define the streaming DataFrame
stream_df = (
    spark.readStream
    .format("deltasharing")
    .load(table_url)
    .where("siteName = 'siteXYZ'")
)

# Example sink: console (demo)
query = (
    stream_df.writeStream
    .format("console")
    .outputMode("append")
    .trigger(processingTime="30 seconds")
    .option("truncate", "false")
    .option("checkpointLocation", "/tmp/nexalis_stream_out/_chkpt")
    .start()
)

query.awaitTermination()

Writing Streaming Data to Outputs

Every Structured Streaming job must define a sink (output destination). Common options:

Console (testing/debugging)

Displays records in the terminal.

query = (
    stream_df.writeStream
    .format("console")
    .outputMode("append")
    .start()
)

Parquet / Delta Files (persistent storage)

Stores results for later queries or integration into pipelines.

query = (
    stream_df.writeStream
    .format("parquet")
    .option("path", "/tmp/nexalis_stream_out/parquet")
    .option("checkpointLocation", "/tmp/nexalis_stream_out/_chkpt")
    .start()
)

Database / API (integration with dashboards or apps)

Pushes each micro-batch into an external system.

def save_to_db(batch_df, batch_id):
    batch_df.write \
        .format("jdbc") \
        .option("url", "jdbc:postgresql://dbserver/mydb") \
        .option("dbtable", "streaming_results") \
        .option("user", "dbuser") \
        .option("password", "dbpass") \
        .mode("append") \
        .save()

query = (
    stream_df.writeStream
    .foreachBatch(save_to_db)
    .start()
)

⚠️ Always configure a checkpointLocation to ensure Spark tracks what data has already been processed.

Launching with spark-submit

spark-submit \\
  --packages io.delta:delta-sharing-spark_2.12:0.6.4 \\
  streaming_reader.py \\
  --delta_table_path './config.share#<user_name>.<client_name>.<table>'

Why Streaming is Different from Periodic Batch

Periodic batch (cron + spark-submit): each run starts fresh. Without custom logic (like tracking the last timestamp), it may re-read old data.
Structured Streaming (spark.readStream): one long-lived Spark job. It tracks progress in checkpoints and automatically processes only new data each trigger.

Choosing a Method

Method	Best For	Limitations	Example Use Cases
Method A — Embedded Python	Quick exploration, notebooks, small to medium data pulls	Not optimized for large-scale jobs; limited logging and monitoring	Interactive analysis, prototyping in Jupyter notebooks, testing queries
Method B — spark-submit (Batch)	Large repeatable snapshots, ETL jobs, controlled pipelines	Requires Spark installation and setup; each run starts fresh (may re-read old data if not handled)	Scheduled data extractions, periodic reporting, building data pipelines
Method B — spark-submit (Streaming)	Near real-time ingestion and continuous updates	More complex setup; requires checkpointing and monitoring; long-running job	Real-time dashboards, alerting systems, streaming ETL into databases

Checklist to Avoid Pitfalls

Java/Spark prerequisites: ensure JAVA_HOME points to a compatible JDK (8/11/17).
Version alignment: keep Spark at 3.4.2 and the Delta Sharing connector at io.delta:delta-sharing-spark_2.12:0.6.4.
Path to config.share: use a correct relative or absolute path.
Time filters: tsConnector is in epoch milliseconds. Convert your time windows.
Local parallelism: add --master local[*] if you want Spark to use all cores.
Streaming durability: always configure a checkpointLocation.
Table access: make sure <user-name>.<client-name>.<table> matches exactly what Nexalis shared with you.

Overview

Nexalis Cloud

Nexalis Agent

Credential File

Consuming the data

Python Example

What Nexalis Provides

Prerequisites

Method A — Embedded Python (Spark created in-process)

Steps

Method B — spark-submit (standalone Spark)

One-time Spark Setup

Batch launch template (snapshot reads)

Real-Time Refresh with Structured Streaming

Minimal streaming example

Writing Streaming Data to Outputs

Console (testing/debugging)

Parquet / Delta Files (persistent storage)

Database / API (integration with dashboards or apps)

Launching with spark-submit

Why Streaming is Different from Periodic Batch

Choosing a Method

Checklist to Avoid Pitfalls

Overview

Nexalis Cloud

Nexalis Agent

​Delta Sharing open sharing protocol.

​Credential File

​Consuming the data

​Python Example

​Delta Sharing with Python and Spark — How to Run Locally

​What Nexalis Provides

​Prerequisites

​Method A — Embedded Python (Spark created in-process)

​Steps

​Method B — spark-submit (standalone Spark)

​One-time Spark Setup

​Batch launch template (snapshot reads)

​Real-Time Refresh with Structured Streaming

​Minimal streaming example

​Writing Streaming Data to Outputs

​Console (testing/debugging)

​Parquet / Delta Files (persistent storage)

​Database / API (integration with dashboards or apps)

​Launching with spark-submit

​Why Streaming is Different from Periodic Batch

​Choosing a Method

​Checklist to Avoid Pitfalls

Delta Sharing open sharing protocol.

Credential File

Consuming the data

Python Example

Delta Sharing with Python and Spark — How to Run Locally

What Nexalis Provides

Prerequisites

Method A — Embedded Python (Spark created in-process)

Steps

Method B — spark-submit (standalone Spark)

One-time Spark Setup

Batch launch template (snapshot reads)

Real-Time Refresh with Structured Streaming

Minimal streaming example

Writing Streaming Data to Outputs

Console (testing/debugging)

Parquet / Delta Files (persistent storage)

Database / API (integration with dashboards or apps)

Launching with spark-submit

Why Streaming is Different from Periodic Batch

Choosing a Method

Checklist to Avoid Pitfalls