Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions deploy/docker/cp-tools/base/nextflow/howto.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
# How to setup

## Description
This Docker includes additional component `nf-weblog-handler`.

This Docker includes additional component `nf-weblog-handler`.
This `handler` can utilize `nextflow` feature `-with-weblog` to redirect events to Cloud-Pipeline `run/{runId}/engine/tasks` API

## How to use

### Manually

To manually configure `nf-weblog-handler`. The following steps should be done:

1. Start `nf-weblog-handler` with: `/opt/nf-weblog-handler/nf-weblog-handler.sh --start -p <port [default: 8080]>`
2. Run nextflow with `-with-weblog http://localhost:<port>/nextflow/event` parameter:
```
nextflow run <path-to-nf-file> -with-weblog http://localhost:8080/nextflow/event
```
3. Now `nf-weblog-handler` should send event to the Cloud-Pipeline API. And you should be able to see expanded nextflow statistic for the run.

### Configure Cloud-Pipeline custom capamility
### Configure Cloud-Pipeline custom capability

It is also possible to configure this image to use this functionality automatically:

Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The following sections are currently covered in a documentation:
- [Release notes v.0.15](md/release_notes/v.0.15/v.0.15_-_Release_notes.md)
- [Release notes v.0.16](md/release_notes/v.0.16/v.0.16_-_Release_notes.md)
- [Release notes v.0.17](md/release_notes/v.0.17/v.0.17_-_Release_notes.md)
- [Release notes v.0.20](md/release_notes/v.0.20/v.0.20_-_Release_notes.md)
- User guide
- [Table of contents](md/manual/Cloud_Pipeline_-_Manual.md)

Expand Down
172 changes: 172 additions & 0 deletions docs/md/installation/native/gcp/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Cloud Pipeline based on GCP GKE deployment guide

This document provides a guidance how to deploy infrastructure using `Terraform` and install Cloud Pipeline on Google Cloud.

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Pre-created network](#pre-created-network)
- [Environment variables](#environment-variables)
- [Terraform backend](#terraform-backend)
- [Authentication and access setup](#authentication-and-access-setup)
- [SSH key](#ssh-key-jump-host-access)
- [Service account key](#service-account-key)
- [Module structure](#module-structure)
- [Usage](#usage)
- [Terraform workflow](#terraform-workflow)
- [Additional notes](#additional-notes)

## Overview

This Terraform setup provisions the core infrastructure required to deploy **Cloud Pipeline** in **Google Cloud Platform (GCP)**.

It sets up:

- **GKE cluster**
- **Filestore (NFS)**
- **Cloud SQL (Private IP)**
- **Cloud Storage bucket**
- **Artifact Registry**
- **Firewall rules**
- **Jump Host** (Compute Engine VM used to install Cloud Pipeline)

Once the infrastructure is deployed, Cloud Pipeline is installed by executing scripts from your local machine to the Jump Host via SSH.

> **_Note_**: the same machine that runs Terraform must be able to SSH into the Jump Host.

## Prerequisites

### Pre-created network

An existing **VPC** and **subnet** must be defined using data blocks in your Terraform root:

```hcl
data "google_compute_network" "shared_network" {
name = "network-xxxxxxxx"
project = "project-xxxxxxxx"
}

data "google_compute_subnetwork" "shared_subnet" {
name = "subnet-xxxxxxxx"
region = "<region>"
project = "project-xxxxxxxx"
}
```

> Replace `xxxxxxxx` and `<region>` with actual values for your network, subnet, project and region.

### Environment variables

Each environment should have a corresponding `.tfvars` file located in:

```
env/<environment>/terraform.tfvars
```

Examples:

- `env/dev/terraform.tfvars`
- `env/prod/terraform.tfvars`

Define here:

- Project ID, region
- CIDR ranges
- Filestore size
- GKE settings
- Any flags

### Terraform backend

Terraform state shall be stored in a Google Cloud Storage bucket.

```hcl
terraform {
backend "gcs" {
bucket = "gke-main-tfstate"
prefix = "clusters/"
}

required_version = ">= 1.12.1"

required_providers {
google = {
source = "hashicorp/google"
version = "~> 6.37.0"
}
}
}
```

> **_Note_**: it is recommended to use `terraform workspace` to isolate environments, though not strictly required.

## Authentication and access setup

### SSH key (Jump Host access)

Generate a key pair before running Terraform:

```bash
ssh-keygen -t rsa -f ./gcp-key -C <[email protected]> -b 2048
```

- `gcp-key` - private key (keep local)
- `gcp-key.pub` - public key (used in Jump Host metadata)

> **_Note_**: file must be named `gcp-key`.

### Service account key

Place the service account key in:

```
scripts/key.json
```

This service account must have the following roles:

| Role | Purpose |
|---|---|
| **`roles/compute.admin`** | Manage Compute Engine, firewall, VMs |
| **`roles/container.admin`** | Create/manage GKE clusters |
| **`roles/container.clusterAdmin`** | Cluster-wide control |
| **`roles/storage.admin`** | Manage Google Cloud Storage buckets |
| **`roles/iam.serviceAccountUser`** | Bind and impersonate service accounts |
| **`roles/iam.serviceAccountTokenCreator`** | Generate OAuth tokens |

## Module structure

| Module | Description |
|---|---|
| **`gcp-network`** | Creates VPC and subnet configuration |
| **`gke`** | Deploys the Kubernetes Engine cluster |
| **`filestore`** | NFS storage for persistent data |
| **`gcp-sql`** | Cloud SQL (PostgreSQL) with private IP |
| **`gcp-bucket`** | Cloud Storage bucket |
| **`gateway-vm`** | Jump Host VM with SSH key injected |
| **`gcp-iam`** | IAM roles, bindings, service accounts |
| **`gcp-artifact-registry`** | Artifact Registry for Docker images and artifacts |

## Usage

### Terraform workflow

```bash
# Initialize Terraform
terraform init

# (Optional) Create and select workspace
terraform workspace new dev
terraform workspace select dev

# Apply using environment tfvars
terraform plan -var-file="env/dev/terraform.tfvars"
terraform apply -var-file="env/dev/terraform.tfvars"
```

Launch described workflow for each environment you need (`dev`, `prod`, etc.).

## Additional notes

- Ensure your local machine has SSH access to the Jump Host.
- Installation scripts are automatically uploaded and executed after provisioning.
- You can disable or extend modules as needed per environment.
61 changes: 44 additions & 17 deletions docs/md/manual/03_Overview/3._Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,40 @@
- [Runs](#runs)
- [Settings](#settings)
- [Search](#search)
- [Billing](#billing)
- [Chatbot](#chatbot)
- [Notifications](#notifications)
- [Logout](#logout)

## User Journey in a nutshell

**Cloud Pipeline (CP)** is a cloud-based web application which allows users to solve a wide range of analytical tasks and includes:
**Cloud Pipeline** is a cloud-based web application which allows users to solve a wide range of analytical tasks and includes:

- **Data processing**: you can create data processing pipelines and run them in the cloud in an automated way.
- **Data storing**: create your data storage, download or upload data from it or edit text files within **CP** UI. File version control is supported.
- **Data storing**: create your data storage, download or upload data from it or edit text files within **Cloud Pipeline** UI. File version control is supported.
- **Tool management**: you can create and deploy your own calculation environment using Docker's container concept.

This Manual is mostly around data processing lifecycle which, in a nutshell, can be described in these several steps:

1. To run a user's calculation script it shall be registered in **CP** as a **pipeline**. The script could be created in **CP** environment or uploaded from the local machine. See more details [6. Manage Pipeline](../06_Manage_Pipeline/6._Manage_Pipeline.md).
1. To run a user's calculation script it shall be registered in **Cloud Pipeline** as a **pipeline**. The script could be created in **Cloud Pipeline** environment or uploaded from the local machine. See more details [6. Manage Pipeline](../06_Manage_Pipeline/6._Manage_Pipeline.md).
**_Note_**: If you need to run a pipeline in different environments simultaneously or set a specific type of data, you can use detach configuration object. See more [7. Manage Detached configuration](../07_Manage_Detached_configuration/7._Manage_Detached_configuration.md).
2. To store pipeline's inputs and outputs data files the **Data Storage** shall be determined in **CP**. Learn more [8. Manage Data Storage](../08_Manage_Data_Storage/8._Manage_Data_Storage.md).
3. Almost every pipeline requires a specific package of software to run it, which is defined in a docker image. So when a user starts a pipeline, **CP** starts a new cloud instance (nodes) and runs a docker image at it. See more details [9. Manage Cluster nodes](../09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md) and [10. Manage Tools](../10_Manage_Tools/10._Manage_Tools.md).
4. When the environment is set, pipeline starts execution. A user in **CP** can change and save configurations of the run. Learn more [6.2. Launch a pipeline](../06_Manage_Pipeline/6.2._Launch_a_pipeline.md).
5. A user can monitor the status of active and completed runs and usage of active instances in **CP**. Learn more [11. Manage Runs](../11_Manage_Runs/11._Manage_Runs.md) and [9. Manage Cluster nodes](../09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md).
2. To store pipeline's inputs and outputs data files the **Data Storage** shall be determined in **Cloud Pipeline**. Learn more [8. Manage Data Storage](../08_Manage_Data_Storage/8._Manage_Data_Storage.md).
3. Almost every pipeline requires a specific package of software to run it, which is defined in a docker image. So when a user starts a pipeline, **Cloud Pipeline** starts a new cloud instance (nodes) and runs a docker image at it. See more details [9. Manage Cluster nodes](../09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md) and [10. Manage Tools](../10_Manage_Tools/10._Manage_Tools.md).
4. When the environment is set, pipeline starts execution. A user in **Cloud Pipeline** can change and save configurations of the run. Learn more [6.2. Launch a pipeline](../06_Manage_Pipeline/6.2._Launch_a_pipeline.md).
5. A user can monitor the status of active and completed runs and usage of active instances in **Cloud Pipeline**. Learn more [11. Manage Runs](../11_Manage_Runs/11._Manage_Runs.md) and [9. Manage Cluster nodes](../09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md).

**_Note_**: **CP** can run a docker image at instance without any pipeline at all if needed. There just will be an instance with some installed and running software. A user can SSH to it or use it in the interactive mode.
**_Note_**: **Cloud Pipeline** can run a docker image at instance without any pipeline at all if needed. There just will be an instance with some installed and running software. A user can SSH to it or use it in the interactive mode.

Also, **CP** supports CLI, which duplicates some of GUI features and has extra features unavailable via GUI, such as automation of interaction with **CP** during the pipeline script running, or uploading considerable amount of data (more than 5 Gb), etc. The basics of CLI you can learn [here](../14_CLI/14._Command-line_interface.md).
Also, **Cloud Pipeline** supports CLI, which duplicates some of GUI features and has extra features unavailable via GUI, such as automation of interaction with **Cloud Pipeline** during the pipeline script running, or uploading considerable amount of data (more than 5 Gb), etc. The basics of CLI you can learn [here](../14_CLI/14._Command-line_interface.md).

![CP_Overview](attachments/Overview_1.png)

## GUI menu tab bar

There are several menu tabs at the left edge of **CP** window.
![CP_Overview](attachments/Overview_2.png)
There are several items in the main menu at the left side of the **Cloud Pipeline** window:
![CP_Overview](attachments/Overview_2.png)

> Please note, some menu items may be hidden in certain deployment or be unavailable for users due to their roles and permissions.

### Home

Expand Down Expand Up @@ -70,7 +75,7 @@ Learn more about **Home** tab [18. Home page](../18_Home_page/18._Home_page.md
The tab consists of two panels:

- **"Hierarchy"** view (see the picture below, **2**) displays a hierarchical-structured list of pipelines, folders, storages and machine configurations, etc.
Use the "**Search**" field to find a **CP** object by a name.
Use the "**Search**" field to find a **Cloud Pipeline** object by a name.
"**Collapse/Expand**" button (see the picture below, **1**) at the bottom-left corner of the screen: use it to collapse or expand **"Hierarchy"** view.
- **"Details"** view (see the picture below, **3**) shows details of a selected item in hierarchy panel. Depends on a selected type of object it has a very different view. You can learn about each on respective pages of the manual.

Expand All @@ -88,24 +93,46 @@ This space provides a list of working nodes. You can get information on their u

### Runs

This space helps you monitor the state of your run instances. You can get parameters and logs of a specific run, stop a run, rerun a completed run. Learn more [11. Manage Runs](../11_Manage_Runs/11._Manage_Runs.md).
This space helps you monitor the state of your run instances. You can get parameters and logs of a specific run, stop a run, rerun a completed run. Learn more in [11. Manage Runs](../11_Manage_Runs/11._Manage_Runs.md).

### Settings

This tab opens a **Settings** window which allows:

- generate a CLI installation and configuration commands to set CLI for **CP**,
- generate a CLI installation and configuration commands to set CLI for **Cloud Pipeline**,
- manage system events notifications,
- manage roles and permissions.

See more details [12. Manage Settings](../12_Manage_Settings/12._Manage_Settings.md).
See more details in [12. Manage Settings](../12_Manage_Settings/12._Manage_Settings.md).

### Search

It's a search field which allows searching specific runs by its "Run ID" or "Parameters".
This menu allows to open the search form - to find any platform data and objects.
See more details in [19. Global Search](../19_Search/19._Global_search.md).

### Billing

This menu item allows to view the information on the platform's compute and storage costs - in the form of various types of reports.
See more details in [Appendix D. Costs management](../Appendix_D/Appendix_D._Costs_management.md#billing-reports).

### Chatbot

This menu allows to open the platform-integrated AI-powered chatbot, enabling users to perform the following tasks:

- obtain help on the use of platform features
- assist with the data search, analysis and workflows launch

See more details in [20. Chatbot](../20_Chatbot/20._Chatbot.md).

### Notifications

All email notifications, that are sending by **Cloud Pipeline** platform to users (for various events - see the full list [here](../12_Manage_Settings/12._Manage_Settings.md#email-notifications)), are also duplicated as push notifications.
This menu allows to view all notifications sent to the current user right in the platform GUI.

See more details in [12.9 Email Notifications](../12_Manage_Settings/12.9._Change_email_notification.md#push-notifications).

### Logout

This is a **Logout** button which logs you out.
**_Note_**: if automatic logging-in is configured, you will be re-logged at once.
**_Note_**: if any changes occur in the **CP** application during an authorized session, the changes are applied after re-login.
**_Note_**: if any changes occur in the **Cloud Pipeline** application during an authorized session, the changes are applied after re-login.
Binary file modified docs/md/manual/03_Overview/attachments/Overview_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,35 @@ To change default pipeline configuration:

### Add/delete storage rules (optional)

This section allows configuring what data will be transferred to an STS after pipeline execution.
**STORAGE RULES** section allows to configure which data from the pipeline output after pipeline execution:

- will be transferred to an STS
- will be displayed in the [**Reports tab**](../11_Manage_Runs/11.6._Nextflow_runs_visualization.md#reports-tab) of the pipeline **Run details** page

To add a new rule:

1. Click the **Add new rule** button. A pop-up will appear.
1. Click the **Add new rule** button:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_15.png)
2. Enter File mask and then tick the box "Move to STS" to move pipeline output data to STS after pipeline execution.
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_16.png)
**_Note_**: If many rules with different Masks are present all of them are checked one by one. If a file corresponds to any of rules - it will be uploaded to the bucket.
3. To delete storage rule click the **Delete** button in the right part of the storage rule's row.
2. Pop-up will appear:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_63.png)
3. Specify a value to the **File mask** field. This value will define a relative path to specific data inside `$ANALYSIS_DIR`, and all output data from the `$ANALYSIS_DIR` matching this mask will be uploaded to the STS after the pipeline execution.
4. Tick the box **Move to STS**:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_64.png)
5. If you want that output data matching the **File mask** will be available in the **Reports tab** of the pipeline **Run details** page, tick the box **Pipeline results**. Please note, this checkbox enabling automatically ticks the **Move to STS** box, even if it is not ticked previously.
6. Specify a value to the **Name** field. Please note, this field value is optional until the **Pipeline results** box is not ticked.
7. Click the **Create** button:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_16.png)
8. Created rule will appear in the list:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_65.png)

> **_Note_**: if many rules with different masks are presented, each of them is checked one by one. If a file corresponds to any of rules - it will be uploaded to the bucket.

To remove an existing rule:

1. Click the **Delete** button in the storage rule's row, e.g.:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_66.png)
2. Confirm the deletion:
![CP_CreateAndConfigurePipeline](attachments/CreateAndConfigurePipeline_67.png)

## Edit a pipeline info

Expand Down
Loading