Tensorlake is a serverless platform for building data applications and agents in Python that can ingest and transform unstructured data before landing them in Snowflake's SQL database or Cortex Search Engine. Building orchestration and ingestion on Tensorlake allows building complex distributed applictions in Python, where the orchestration and compute logic live together. This is an alternative to building orchestration with SQL and keeping compute logic in Snowflake Table Functions.
Tensorlake's applications automatically behave like durable queues so you wouldn't need to setup Kafka or other queues to manage ingestion. The clusters automatically scales up as data is ingested to process them.
We present some blueprints for production ready patterns to integrate with Snowflake and code that you can deploy under 2 minutes and expereince the integration.
The Tensoralake Application receives Document URLs over HTTP, uses an OCR API to parse the document, calls an LLM for structured extraction, and then uses Snowflake's JDBC driver to write structured data into your Snowflake Database. Once it's inside Snowflake you can do all sorts of analytics on the data.
The Application is written in Python, without any external orchestration engines, so you can build and test it like any other normal application. You can use any OCR API in the Application, or even run open source OCR models on GPUs by annotating the OCR function with a GPU enabled hardware resource.
Tensorlake automatically queues requests and scales out the cluster, there is no extra configuration required for handling spiky ingestion.
Try out the code here.
This solution combines a Tensorlake serverless extraction application with a Streamlit interactive query interface to create an intelligent Wikipedia knowledge base. The Tensorlake application (extract-wikipedia) accepts a page type (like "actors"), uses BeautifulSoup and Requests to crawl Wikipedia pages, then leverages LangChain with OpenAI to intelligently parse HTML, chunk content, and extract structured information (birth dates, career highlights, key events). Everything is stored in Snowflake with both structured data tables and text embeddings.
The Streamlit application (query-wikipedia) provides an interactive UI for querying this knowledge base. It orchestrates a sophisticated two-phase search using Snowflake Cortex - first querying structured data, then using those results as filters for semantic search through text embeddings. This hybrid approach delivers highly relevant results that OpenAI GPT-4 transforms into natural language answers.
The Tensorlake extraction app runs serverlessly with automatic orchestration - no infrastructure management needed. The Streamlit app provides a user-friendly interface that data teams can customize and extend. Together, they create a powerful system where Tensorlake handles the heavy lifting of data extraction while Streamlit delivers an elegant query experience.
Try out the code here.
For more complete documention on the Tensorlake Platform (including serverless Applications and Document AI), visit the Tensorlake docs.
- Snowflake account
- Tensorlake API key (get one here)
- Python 3.9+
Integrating Snowflake into a Tensorlake Application is easy. To get started, install the Tensorlake SDK:
# Install the Tensorlake SDK
pip install tensorlakeThen, store your Snowflake secrets on the Tensorlake platform:
# Set Tensorlake Secrets for deployed Applications
tensorlake secrets set SNOWFLAKE_ACCOUNT='YOUR_SNOWFLAKE_ACCOUNT'
tensorlake secrets set SNOWFLAKE_USER='YOUR_SNOWFLAKE_USER'
tensorlake secrets set SNOWFLAKE_PASSWORD='YOUR_SNOWFLAKE_PASSWORD'
tensorlake secrets set SNOWFLAKE_WAREHOUSE='YOUR_SNOWFLAKE_WAREHOUSE'
tensorlake secrets set SNOWFLAKE_DATABASE='YOUR_SNOWFLAKE_DATABASE'
tensorlake secrets set SNOWFLAKE_SCHEMA='YOUR_SNOWFLAKE_SCHEMA'Define an image to ensure your Application has access to the libraries it needs to complete the task.
# Define the image for the application
image = (
Image(base_image="python:3.11-slim", name="snowflake-sec")
.run("pip install snowflake-connector-python pandas pyarrow")
)Your entrypoint function is defined by the @application() decorator. This decorator tells the Tensorlake platform which function starts the application. The entrypoint function name is also the name of the Application. For example, when deployed, this Application will be called document_ingestion:
# Specify the entry point to the application
@application()
@function()
def document_ingestion(document_url:str) -> None:
# Some codeEach function that is part of your Application needs to have a function() decorator. The function() decorator can accepts two parameters: secrets and image.
# Specify any secrets needed for this funciton
# Specify the image needed for this funciton
@function(
secrets=[
"TENSORLAKE_API_KEY",
"SNOWFLAKE_ACCOUNT",
"SNOWFLAKE_USER",
"SNOWFLAKE_PASSWORD",
"SNOWFLAKE_WAREHOUSE"
],
image=image
)
def write_to_snowflake(parse_id:str) -> None:
# Some codeOnce you have written your application, you can deploy it with the Tensorlake deploy CLI command:
tensorlake deploy document-infestion.pyOnce deployed, you can trigger your application by posting to the HTTP endpoint. With Tensorlake's auto-scaling platform you can make 100,000s of requests and Tensorlake will automatically scale to handle them all, then automatically scale back.
# Basic curl request to Tensorlake application endpoint
curl -X POST https://api.tensorlake.ai/applications/document-ingestion \
-H "Authorization: Bearer YOUR_TENSORLAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/document.pdf"
}'Tensorlake is a complete serverless application platform that revolutionizes complex ETL:
- Replace Complex ETL Stacks: No more Airflow, Kafka, or queue management. Tensorlake applications automatically behave like durable queues with built-in orchestration
- Python-Native Development: Build data applications in pure Python instead of wrestling with SQL expressions and UDFs
- Auto-Scaling Infrastructure: Clusters automatically scale from 0 to thousands of workers as data flows increase
- Direct Snowflake Integration: Land processed data directly into Snowflake tables or Cortex Search without intermediate storage
As part of the Tensorlake platform, you also get comprehensive data processing, extraction, and ingestion out of the box:
- Multi-Modal Data Handling: Process documents, images, presentations, spreadsheets, and raw text in unified workflows
- Adaptive Processing: Dynamic model orchestration that adapts to data complexity in real-time
- Layout-Aware Understanding: Preserves document structure, tables, and semantic relationships
- Guaranteed Processing: Durable execution ensures every piece of data is processed without drops or failures
- SOC 2 Type II certified infrastructure
- HIPAA/GDPR compliant processing
- Row-level security in Snowflake
- Audit logging for all operations
- Tensorlake Slack: Join our community
- GitHub Issues: Report bugs or request features
- Enterprise Support: [email protected]

