Skip to content

Tracking: PostgreSQL support in KFP #9813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
7 of 29 tasks
Tracked by #10402
zijianjoy opened this issue Aug 3, 2023 · 19 comments
Open
7 of 29 tasks
Tracked by #10402

Tracking: PostgreSQL support in KFP #9813

zijianjoy opened this issue Aug 3, 2023 · 19 comments

Comments

@zijianjoy
Copy link
Contributor

zijianjoy commented Aug 3, 2023

PostgreSQL request has become the top upvoted issue on KFP repo: #7512. This issue is for tracking the work of this integration.

  • KFP Backend integration

    • Define the DB config for PostgreSQL DB info
      • Basic connection config
      • Secure connection config: passfile, hostaddr, ssl mode, certificate, etc.
      • Flag to switch between Postgresql driver and MySQL driver
    • Upgrade GORM to v2 #9859
    • Syntax support of PostgreSQL DB in backend
      • Adopt a different syntax during initialization. (schema difference)
      • Adopt a different syntax during execution. (Modification of data)
      • Explore the dialect difference and develop a control mechanism to enable easy testing on the storage layer.
    • Testing
      • Unit testing for Postgresql behavior.
      • Functional testing for Postgresql behavior.
      • E2E testing for Postgresql behavior.
    • Cache server integration
    • KFP API server integration
    • Manifest support for Postgresql
  • MLMD integration

@zijianjoy
Copy link
Contributor Author

cc @chensun

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Nov 27, 2023
@chensun chensun mentioned this issue Jan 16, 2024
18 tasks
@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 12, 2024
@EshaanAgg
Copy link

Hi! I am Eshaan Aggarwal, an avid Open Source enthusiast from India. I am a web developer proficient in GoLang and PostgreSQL and have recently started learning about Kubernetes. I would love to contribute to this issue and hopefully join Kubeflow as a GSoC '24 mentee. Are there any pre-tests or other beginner-friendly contributions I can make to get acquainted with this project and as a proof of skill?

@UditNayak
Copy link

Hello, I'm Udit. Proficient in Python, Golang, and PostgreSQL, I've recently completed a comprehensive course on Kubernetes. The skills acquired perfectly align with the requirements of this project, making it an ideal platform for me to apply and further enhance my knowledge. I'm enthusiastic about contributing to this GSoC project, especially on the specified issue. Eagerly anticipating the chance to contribute to its development!

@zijianjoy
Copy link
Contributor Author

zijianjoy commented Mar 7, 2024

Hello @EshaanAgg and @UditNayak , thank you for your interest and I am assuming @rimolive will be your mentor.

As a start, I would recommend learning:

  • PostgreSQL syntax
  • GORM which is a database syntax abstraction library: https://gorm.io/index.html
  • Have a kubernetes environment yourself for development.

Then, my advice to the development will be in following orders:

  1. Make sure you can bring up KFP in the kubernetes environment
  2. Make sure you can bring up a postgresql instance and access to it manually in the kubernetes environment
  3. Make changes in KFP API server so it can read postgresql connection config from parameter/envionrment-variable. Then KFP API server should establish connection with the postgresql instance in the same cluster with such connection config.
  4. Make corresponding GORM change so that CRUD (create/read/update/delete) operation of KFP can be executed correctly using postgresql. (It is a good time to write some unit test or E2E test)
  5. Perform the similar actions as above for cache server.

I believe @rimolive can facilitate more once you dive deep into the project. But feel free to take any task you want to work on and ask questions along the way. Have fun!

@rimolive
Copy link
Member

In addition to what @zijianjoy said, please join us on our Slack. We have the #gsoc-participants channel to welcome everyone interested in the GSoC projects.

@VDliveson
Copy link

@zijianjoy I am a web developer from India, and I would like to work on this issue. I know python,golang and a bit of kubernetes. How to get started on this

@Irshu786
Copy link

@zijianjoy Im currently doing ops in kubeflow, loved the concept of changing db, Will be working on these.

#Make sure you can bring up KFP in the kubernetes environment
#Make sure you can bring up a postgresql instance and access to it manually in the kubernetes environment
#Make changes in KFP API server so it can read postgresql connection config from parameter/envionrment-variable. Then KFP API #server should establish connection with the postgresql instance in the same cluster with such connection config.
#Make corresponding GORM change so that CRUD (create/read/update/delete) operation of KFP can be executed correctly using postgresql. (It is a good time to write some unit test or E2E test)
#Perform the similar actions as above for cache server.

@SnehaAgg0212
Copy link

@zijianjoy I am a BE developer from India and am comfortable with python, goLang, java, Kubernetes and SQL databases. I would like to contribute to this issue. What are the steps?

@jiduyuting
Copy link

hello@zijianjoy I'm a junior studen from China major in computer science and technology, I'm very interested in open source project and want to do some contribution to this project, at the same time to improve my skill. I have joined our school lab which connect with database and cloud. So I frequently contact with postgresql and kubernetes. I'm doing the steps you refered above,but I can't join in the #gsoc-participants,can this have influence?

@Irshu786
Copy link

Irshu786 commented Mar 26, 2024 via email

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 26, 2024
@rimolive
Copy link
Member

/lifecycle frozen

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels May 27, 2024
@sagnik3788
Copy link

@zijianjoy anyone working on this issue ?

@rimolive
Copy link
Member

@sagnik3788 This is part of the Google Summer of Code. You can find details in https://www.kubeflow.org/events/gsoc-2024/#project-9-postgresql-integration-in-kubeflow-pipelines.

@DajunZhou
Copy link

When will official support for PostgreSQL be available?

@DajunZhou
Copy link

Hi Kubeflow team and community,

I’ve been following this issue closely since our company, due to internal policies, cannot use MySQL, and we’re really looking forward to full PostgreSQL support in Kubeflow Pipelines. The flexibility of PostgreSQL would be a major advantage for us in deploying and managing our ML workflows.

Could the maintainers provide an update on the timeline or roadmap for officially supporting PostgreSQL? We’d love to know when we might expect this integration to be fully implemented.

Thanks for all the great work on this project!

Best regards

@arjun-thekkethil
Copy link

Hi @rimolive @shivaylamba,

I'm Arjun T, a B.Tech student and aspiring contributor. I'm interested in working on this issue as part of Google Summer of Code 2025, and I'd love to help implement PostgreSQL support for Kubeflow Pipelines.

I've started exploring the codebase and familiarizing myself with the backend components. I’d appreciate any guidance on the current status of this feature, and whether there are any existing design documents or initial tasks that I can take up to start contributing effectively.

Looking forward to your suggestions and happy to collaborate!

Best regards,
Arjun T

@boydfd
Copy link

boydfd commented Apr 27, 2025

Hi Kubeflow team @rimolive @zijianjoy,

Due to the requirements of a previous project, I helped investigate the possibility of adapting Kubeflow Pipelines for PostgreSQL compatibility. Through some code modifications, I successfully integrated PostgreSQL locally and was able to run the project's demo pipeline smoothly without encountering any obvious errors.

However, I might have introduced some breaking changes that could potentially cause issues with MySQL functionality (which I haven't had time to test). Therefore, I'd like to publish this code and see if anyone has the time to help integrate these changes properly and perform thorough testing.

Anyone interested can refer to this commit: boydfd@bb6fb6b

Here's a summary of the key modifications:

  1. Cache Server:
  • Added PostgreSQL configuration and database initialization code.
  • Modified the ID field generation to support PostgreSQL (as it lacks the AUTO_INCREMENT feature found in MySQL).
  1. Backend:
  • Modified PostgreSQL database initialization code.
  • Model: All database field names in the models were changed to lowercase. This is because PostgreSQL defaults identifiers to lowercase unless they are enclosed in double quotes.
  • Storage:
    • Replaced all instances of sq.select/sq.insert with s.db.GetStatementBuilder().select/s.db.GetStatementBuilder().insert. This change was necessary because PostgreSQL uses numbered placeholders ($1, $2, etc.) instead of the question mark (?) used by the previous library/MySQL.
    • Adjusted some queries in the job storage and run storage components to be compatible with ONLY_FULL_GROUP_BY behavior. PostgreSQL's aggregate functions require that all selected columns are either aggregated or included in the GROUP BY clause.
    • Implemented a PostgreSQL-specific version of SQLDialect and added corresponding tests.

The images below provide some evidence from my local testing.

Image

Image

Image

Image

Hope PostgreSQL support can be officially integrated soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: P0
Development

No branches or pull requests