Skip to content

Commit 29b3c61

Browse files
denikHariGS-DB
authored andcommitted
acc: Include full output for default-python/classic (databricks#2391)
## Tests Include full output of default-python/classic so it can be used as a base for diffs in cloud tests databricks#2383
1 parent 1a81aba commit 29b3c61

File tree

20 files changed

+533
-2
lines changed

20 files changed

+533
-2
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Typings for Pylance in Visual Studio Code
2+
# see https://github.com/microsoft/pyright/blob/main/docs/builtins.md
3+
from databricks.sdk.runtime import *
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"recommendations": [
3+
"databricks.databricks",
4+
"ms-python.vscode-pylance",
5+
"redhat.vscode-yaml"
6+
]
7+
}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"python.analysis.stubPath": ".vscode",
3+
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
4+
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
5+
"python.testing.pytestArgs": [
6+
"."
7+
],
8+
"python.testing.unittestEnabled": false,
9+
"python.testing.pytestEnabled": true,
10+
"python.analysis.extraPaths": ["src"],
11+
"files.exclude": {
12+
"**/*.egg-info": true,
13+
"**/__pycache__": true,
14+
".pytest_cache": true,
15+
},
16+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# my_default_python
2+
3+
The 'my_default_python' project was generated by using the default-python template.
4+
5+
## Getting started
6+
7+
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
8+
9+
2. Authenticate to your Databricks workspace, if you have not done so already:
10+
```
11+
$ databricks configure
12+
```
13+
14+
3. To deploy a development copy of this project, type:
15+
```
16+
$ databricks bundle deploy --target dev
17+
```
18+
(Note that "dev" is the default target, so the `--target` parameter
19+
is optional here.)
20+
21+
This deploys everything that's defined for this project.
22+
For example, the default template would deploy a job called
23+
`[dev yourname] my_default_python_job` to your workspace.
24+
You can find that job by opening your workpace and clicking on **Workflows**.
25+
26+
4. Similarly, to deploy a production copy, type:
27+
```
28+
$ databricks bundle deploy --target prod
29+
```
30+
31+
Note that the default job from the template has a schedule that runs every day
32+
(defined in resources/my_default_python.job.yml). The schedule
33+
is paused when deploying in development mode (see
34+
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
35+
36+
5. To run a job or pipeline, use the "run" command:
37+
```
38+
$ databricks bundle run
39+
```
40+
6. Optionally, install the Databricks extension for Visual Studio code for local development from
41+
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
42+
virtual environment and setup Databricks Connect for running unit tests locally.
43+
When not using these tools, consult your development environment's documentation
44+
and/or the documentation for Databricks Connect for manually setting up your environment
45+
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).
46+
47+
7. For documentation on the Databricks asset bundles format used
48+
for this project, and for CI/CD configuration, see
49+
https://docs.databricks.com/dev-tools/bundles/index.html.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# This is a Databricks asset bundle definition for my_default_python.
2+
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
3+
bundle:
4+
name: my_default_python
5+
uuid: [UUID]
6+
7+
include:
8+
- resources/*.yml
9+
10+
targets:
11+
dev:
12+
# The default target uses 'mode: development' to create a development copy.
13+
# - Deployed resources get prefixed with '[dev my_user_name]'
14+
# - Any job schedules and triggers are paused by default.
15+
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
16+
mode: development
17+
default: true
18+
workspace:
19+
host: [DATABRICKS_URL]
20+
21+
prod:
22+
mode: production
23+
workspace:
24+
host: [DATABRICKS_URL]
25+
# We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
26+
root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
27+
permissions:
28+
- user_name: [USERNAME]
29+
level: CAN_MANAGE
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Fixtures
2+
3+
This folder is reserved for fixtures, such as CSV files.
4+
5+
Below is an example of how to load fixtures as a data frame:
6+
7+
```
8+
import pandas as pd
9+
import os
10+
11+
def get_absolute_path(*relative_parts):
12+
if 'dbutils' in globals():
13+
base_dir = os.path.dirname(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()) # type: ignore
14+
path = os.path.normpath(os.path.join(base_dir, *relative_parts))
15+
return path if path.startswith("/Workspace") else "/Workspace" + path
16+
else:
17+
return os.path.join(*relative_parts)
18+
19+
csv_file = get_absolute_path("..", "fixtures", "mycsv.csv")
20+
df = pd.read_csv(csv_file)
21+
display(df)
22+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.databricks/
2+
build/
3+
dist/
4+
__pycache__/
5+
*.egg-info
6+
.venv/
7+
scratch/**
8+
!scratch/README.md
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[pytest]
2+
testpaths = tests
3+
pythonpath = src
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
## requirements-dev.txt: dependencies for local development.
2+
##
3+
## For defining dependencies used by jobs in Databricks Workflows, see
4+
## https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
5+
6+
## Add code completion support for DLT
7+
databricks-dlt
8+
9+
## pytest is the default package used for testing
10+
pytest
11+
12+
## Dependencies for building wheel files
13+
setuptools
14+
wheel
15+
16+
## databricks-connect can be used to run parts of this project locally.
17+
## See https://docs.databricks.com/dev-tools/databricks-connect.html.
18+
##
19+
## databricks-connect is automatically installed if you're using Databricks
20+
## extension for Visual Studio Code
21+
## (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
22+
##
23+
## To manually install databricks-connect, either follow the instructions
24+
## at https://docs.databricks.com/dev-tools/databricks-connect.html
25+
## to install the package system-wide. Or uncomment the line below to install a
26+
## version of db-connect that corresponds to the Databricks Runtime version used
27+
## for this project.
28+
#
29+
# databricks-connect>=15.4,<15.5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# The main job for my_default_python.
2+
resources:
3+
jobs:
4+
my_default_python_job:
5+
name: my_default_python_job
6+
7+
trigger:
8+
# Run this job every day, exactly one day from the last run; see https://docs.databricks.com/api/workspace/jobs/create#trigger
9+
periodic:
10+
interval: 1
11+
unit: DAYS
12+
13+
email_notifications:
14+
on_failure:
15+
- [USERNAME]
16+
17+
tasks:
18+
- task_key: notebook_task
19+
job_cluster_key: job_cluster
20+
notebook_task:
21+
notebook_path: ../src/notebook.ipynb
22+
23+
- task_key: refresh_pipeline
24+
depends_on:
25+
- task_key: notebook_task
26+
pipeline_task:
27+
pipeline_id: ${resources.pipelines.my_default_python_pipeline.id}
28+
29+
- task_key: main_task
30+
depends_on:
31+
- task_key: refresh_pipeline
32+
job_cluster_key: job_cluster
33+
python_wheel_task:
34+
package_name: my_default_python
35+
entry_point: main
36+
libraries:
37+
# By default we just include the .whl file generated for the my_default_python package.
38+
# See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
39+
# for more information on how to add other libraries.
40+
- whl: ../dist/*.whl
41+
42+
job_clusters:
43+
- job_cluster_key: job_cluster
44+
new_cluster:
45+
spark_version: 15.4.x-scala2.12
46+
node_type_id: i3.xlarge
47+
data_security_mode: SINGLE_USER
48+
autoscale:
49+
min_workers: 1
50+
max_workers: 4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# The main pipeline for my_default_python
2+
resources:
3+
pipelines:
4+
my_default_python_pipeline:
5+
name: my_default_python_pipeline
6+
## Specify the 'catalog' field to configure this pipeline to make use of Unity Catalog:
7+
# catalog: catalog_name
8+
target: my_default_python_${bundle.target}
9+
libraries:
10+
- notebook:
11+
path: ../src/dlt_pipeline.ipynb
12+
13+
configuration:
14+
bundle.sourcePath: ${workspace.file_path}/src
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# scratch
2+
3+
This folder is reserved for personal, exploratory notebooks.
4+
By default these are not committed to Git, as 'scratch' is listed in .gitignore.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 2,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"%load_ext autoreload\n",
10+
"%autoreload 2"
11+
]
12+
},
13+
{
14+
"cell_type": "code",
15+
"execution_count": null,
16+
"metadata": {
17+
"application/vnd.databricks.v1+cell": {
18+
"cellMetadata": {
19+
"byteLimit": 2048000,
20+
"rowLimit": 10000
21+
},
22+
"inputWidgets": {},
23+
"nuid": "[UUID]",
24+
"showTitle": false,
25+
"title": ""
26+
}
27+
},
28+
"outputs": [],
29+
"source": [
30+
"import sys\n",
31+
"\n",
32+
"sys.path.append(\"../src\")\n",
33+
"from my_default_python import main\n",
34+
"\n",
35+
"main.get_taxis(spark).show(10)"
36+
]
37+
}
38+
],
39+
"metadata": {
40+
"application/vnd.databricks.v1+notebook": {
41+
"dashboards": [],
42+
"language": "python",
43+
"notebookMetadata": {
44+
"pythonIndentUnit": 2
45+
},
46+
"notebookName": "ipynb-notebook",
47+
"widgets": {}
48+
},
49+
"kernelspec": {
50+
"display_name": "Python 3",
51+
"language": "python",
52+
"name": "python3"
53+
},
54+
"language_info": {
55+
"name": "python",
56+
"version": "3.11.4"
57+
}
58+
},
59+
"nbformat": 4,
60+
"nbformat_minor": 0
61+
}
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
"""
2+
setup.py configuration script describing how to build and package this project.
3+
4+
This file is primarily used by the setuptools library and typically should not
5+
be executed directly. See README.md for how to deploy, test, and run
6+
the my_default_python project.
7+
"""
8+
9+
from setuptools import setup, find_packages
10+
11+
import sys
12+
13+
sys.path.append("./src")
14+
15+
import datetime
16+
import my_default_python
17+
18+
local_version = datetime.datetime.utcnow().strftime("%Y%m%d.%H%M%S")
19+
20+
setup(
21+
name="my_default_python",
22+
# We use timestamp as Local version identifier (https://peps.python.org/pep-0440/#local-version-identifiers.)
23+
# to ensure that changes to wheel package are picked up when used on all-purpose clusters
24+
version=my_default_python.__version__ + "+" + local_version,
25+
url="https://databricks.com",
26+
author="[USERNAME]",
27+
description="wheel file based on my_default_python/src",
28+
packages=find_packages(where="./src"),
29+
package_dir={"": "src"},
30+
entry_points={
31+
"packages": [
32+
"main=my_default_python.main:main",
33+
],
34+
},
35+
install_requires=[
36+
# Dependencies in case the output wheel file is used as a library dependency.
37+
# For defining dependencies, when this package is used in Databricks, see:
38+
# https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
39+
"setuptools"
40+
],
41+
)

0 commit comments

Comments
 (0)