GitHub

Analyze Dataproc Serverless Jobs in BigQuery

使用BigQuery分析Dataproc serverless任务

Pull jobs from your Google Cloud project

update the script according to your project, e.g. project_id, region

./get_dataproc_batches.sh > data.json

Build the data processor and produce the analytical data

go build main.go

now you should see main executable in the project directory

./main --input data.json --output data_new.json

Load data into BigQuery

Go to the BigQuery Console

create the dataset if there is none
create table by upload file data_new.json, let BigQuery auto detect the schema

Do the analysis

https://github.com/cloudymoma/dataproc_serverless_analysis/blob/main/dataproc_serverless_analysis.ipynb

Click Run in Colab Enterprise then import the notebook into BigQuery.

Update the TABLE_ID accordingly to which table you uploaded to from previous step.

(optional) you may need to change the code accordingly to your job configurations.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
dataproc_job_bq_schema.json		dataproc_job_bq_schema.json
dataproc_serverless_analysis.ipynb		dataproc_serverless_analysis.ipynb
get_dataproc_batches.sh		get_dataproc_batches.sh
go.mod		go.mod
main.go		main.go
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analyze Dataproc Serverless Jobs in BigQuery

Pull jobs from your Google Cloud project

Build the data processor and produce the analytical data

Load data into BigQuery

Do the analysis

About

Uh oh!

Releases

Packages

Languages

cloudymoma/dataproc_serverless_analysis

Folders and files

Latest commit

History

Repository files navigation

Analyze Dataproc Serverless Jobs in BigQuery

Pull jobs from your Google Cloud project

Build the data processor and produce the analytical data

Load data into BigQuery

Do the analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages