Snowflake + Tensorlake Integration Examples

Transform Unstructured Data into Queryable and AI Ready Data on Snowflake

Tensorlake is a serverless platform for building data applications and agents in Python that can ingest and transform unstructured data before landing them in Snowflake's SQL database or Cortex Search Engine. Building orchestration and ingestion on Tensorlake allows building complex distributed applictions in Python, where the orchestration and compute logic live together. This is an alternative to building orchestration with SQL and keeping compute logic in Snowflake Table Functions.

Tensorlake's applications automatically behave like durable queues so you wouldn't need to setup Kafka or other queues to manage ingestion. The clusters automatically scales up as data is ingested to process them.

Use Cases

We present some blueprints for production ready patterns to integrate with Snowflake and code that you can deploy under 2 minutes and expereince the integration.

Blueprint: Document Ingestion Pipeline

The Tensoralake Application receives Document URLs over HTTP, uses an OCR API to parse the document, calls an LLM for structured extraction, and then uses Snowflake's JDBC driver to write structured data into your Snowflake Database. Once it's inside Snowflake you can do all sorts of analytics on the data.

The Application is written in Python, without any external orchestration engines, so you can build and test it like any other normal application. You can use any OCR API in the Application, or even run open source OCR models on GPUs by annotating the OCR function with a GPU enabled hardware resource.

Tensorlake automatically queues requests and scales out the cluster, there is no extra configuration required for handling spiky ingestion.

Try out the code here.

Blueprint: Structured Data Extraction

This solution combines a Tensorlake serverless extraction application with a Streamlit interactive query interface to create an intelligent Wikipedia knowledge base. The Tensorlake application (extract-wikipedia) accepts a page type (like "actors"), uses BeautifulSoup and Requests to crawl Wikipedia pages, then leverages LangChain with OpenAI to intelligently parse HTML, chunk content, and extract structured information (birth dates, career highlights, key events). Everything is stored in Snowflake with both structured data tables and text embeddings.

The Streamlit application (query-wikipedia) provides an interactive UI for querying this knowledge base. It orchestrates a sophisticated two-phase search using Snowflake Cortex - first querying structured data, then using those results as filters for semantic search through text embeddings. This hybrid approach delivers highly relevant results that OpenAI GPT-4 transforms into natural language answers.

The Tensorlake extraction app runs serverlessly with automatic orchestration - no infrastructure management needed. The Streamlit app provides a user-friendly interface that data teams can customize and extend. Together, they create a powerful system where Tensorlake handles the heavy lifting of data extraction while Streamlit delivers an elegant query experience.

Try out the code here.

Quick Overview: Tensorlake Applications

For more complete documention on the Tensorlake Platform (including serverless Applications and Document AI), visit the Tensorlake docs.

Prerequisites

Snowflake account
Tensorlake API key (get one here)
Python 3.9+

Installation

Integrating Snowflake into a Tensorlake Application is easy. To get started, install the Tensorlake SDK:

# Install the Tensorlake SDK
pip install tensorlake

Then, store your Snowflake secrets on the Tensorlake platform:

# Set Tensorlake Secrets for deployed Applications
tensorlake secrets set SNOWFLAKE_ACCOUNT='YOUR_SNOWFLAKE_ACCOUNT'
tensorlake secrets set SNOWFLAKE_USER='YOUR_SNOWFLAKE_USER'
tensorlake secrets set SNOWFLAKE_PASSWORD='YOUR_SNOWFLAKE_PASSWORD'
tensorlake secrets set SNOWFLAKE_WAREHOUSE='YOUR_SNOWFLAKE_WAREHOUSE'
tensorlake secrets set SNOWFLAKE_DATABASE='YOUR_SNOWFLAKE_DATABASE'
tensorlake secrets set SNOWFLAKE_SCHEMA='YOUR_SNOWFLAKE_SCHEMA'

Defining an image

Define an image to ensure your Application has access to the libraries it needs to complete the task.

# Define the image for the application
image = (
    Image(base_image="python:3.11-slim", name="snowflake-sec")
    .run("pip install snowflake-connector-python pandas pyarrow")
)

Entrypoint function

Your entrypoint function is defined by the @application() decorator. This decorator tells the Tensorlake platform which function starts the application. The entrypoint function name is also the name of the Application. For example, when deployed, this Application will be called document_ingestion:

# Specify the entry point to the application
@application()
@function()
def document_ingestion(document_url:str) -> None:
    # Some code

Image and secrets

Each function that is part of your Application needs to have a function() decorator. The function() decorator can accepts two parameters: secrets and image.

# Specify any secrets needed for this funciton
# Specify the image needed for this funciton
@function(
  secrets=[
        "TENSORLAKE_API_KEY",
        "SNOWFLAKE_ACCOUNT",
        "SNOWFLAKE_USER", 
        "SNOWFLAKE_PASSWORD",
        "SNOWFLAKE_WAREHOUSE"
    ], 
    image=image
)
def write_to_snowflake(parse_id:str) -> None:
  # Some code

Deploy your Application

Once you have written your application, you can deploy it with the Tensorlake deploy CLI command:

tensorlake deploy document-infestion.py

Trigger your Application

Once deployed, you can trigger your application by posting to the HTTP endpoint. With Tensorlake's auto-scaling platform you can make 100,000s of requests and Tensorlake will automatically scale to handle them all, then automatically scale back.

# Basic curl request to Tensorlake application endpoint
curl -X POST https://api.tensorlake.ai/applications/document-ingestion \
  -H "Authorization: Bearer YOUR_TENSORLAKE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/document.pdf"
  }'

Why This Integration Matters

Beyond Traditional ETL: Serverless Data Applications

Tensorlake is a complete serverless application platform that revolutionizes complex ETL:

Replace Complex ETL Stacks: No more Airflow, Kafka, or queue management. Tensorlake applications automatically behave like durable queues with built-in orchestration
Python-Native Development: Build data applications in pure Python instead of wrestling with SQL expressions and UDFs
Auto-Scaling Infrastructure: Clusters automatically scale from 0 to thousands of workers as data flows increase
Direct Snowflake Integration: Land processed data directly into Snowflake tables or Cortex Search without intermediate storage

Intelligent Data Processing at Scale

As part of the Tensorlake platform, you also get comprehensive data processing, extraction, and ingestion out of the box:

Multi-Modal Data Handling: Process documents, images, presentations, spreadsheets, and raw text in unified workflows
Adaptive Processing: Dynamic model orchestration that adapts to data complexity in real-time
Layout-Aware Understanding: Preserves document structure, tables, and semantic relationships
Guaranteed Processing: Durable execution ensures every piece of data is processed without drops or failures

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
query-wikipedia		query-wikipedia
sec-filings		sec-filings
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snowflake + Tensorlake Integration Examples

Transform Unstructured Data into Queryable and AI Ready Data on Snowflake

Table of Contents

Use Cases

Blueprint: Document Ingestion Pipeline

Blueprint: Structured Data Extraction

Quick Overview: Tensorlake Applications

Prerequisites

Installation

Defining an image

Entrypoint function

Image and secrets

Deploy your Application

Trigger your Application

Why This Integration Matters

Beyond Traditional ETL: Serverless Data Applications

Intelligent Data Processing at Scale

Resources

Security and Compliance

Support

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

tensorlakeai/snowflake

Folders and files

Latest commit

History

Repository files navigation

Snowflake + Tensorlake Integration Examples

Transform Unstructured Data into Queryable and AI Ready Data on Snowflake

Table of Contents

Use Cases

Blueprint: Document Ingestion Pipeline

Blueprint: Structured Data Extraction

Quick Overview: Tensorlake Applications

Prerequisites

Installation

Defining an image

Entrypoint function

Image and secrets

Deploy your Application

Trigger your Application

Why This Integration Matters

Beyond Traditional ETL: Serverless Data Applications

Intelligent Data Processing at Scale

Resources

Security and Compliance

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages