Upload to gcs #108

davidotte · 2025-05-01T14:09:55Z

Change Description

Try to be precise. You can additionally add comments to your PR, this might help the reviewer a lot.

PR corresponds to this server PR.

If you used new dependencies: Did you add them to requirements.txt?

Who did you ping on Mattermost to review your PR? Please ping that person again whenever you are ready for another review.

Breaking changes

If you made any breaking changes, please update the version number.
Breaking changes are totally fine, we just need to make sure to keep the users informed and the server in sync.

Does this PR break the API? If so, what is the corresponding server commit?

Does this PR break the user interface? If so, why?

Please do not mark comments/conversations as resolved unless you are the assigned reviewer. This helps maintain clarity during the review process.

CLAassistant · 2025-05-01T14:10:01Z

All committers have signed the CLA.

Jabb0 · 2025-05-07T09:32:56Z

tabpfn_client/constants.py

@@ -4,3 +4,5 @@
 from pathlib import Path

 CACHE_DIR = Path(__file__).parent.resolve() / ".tabpfn"
+
+LARGE_DATASET_THRESHOLD = 500000


missing unit.

Jabb0 · 2025-05-07T09:34:25Z

tabpfn_client/client.py

+        num_cells = X.shape[0] * (X.shape[1] + 1)
+        if num_cells > LARGE_DATASET_THRESHOLD:
+            # Generate Upload URLs
+            response_x = cls.httpx_client.post(


probably a good idea to hide the low level HTTP calls and the to JSON behind a method. Ideally of an API class to abstact way these details.

Jabb0 · 2025-05-07T09:35:03Z

tabpfn_client/client.py

-            params={"tabpfn_systems": json.dumps(tabpfn_systems)},
-        )
+        num_cells = X.shape[0] * (X.shape[1] + 1)
+        if num_cells > LARGE_DATASET_THRESHOLD:


consider moving this whole block to a helper method to keep the fit method clean.

Jabb0 · 2025-05-07T09:39:28Z

tabpfn_client/client.py

+                        "file_name": "x_test_filename",
+                    },
+                )
+                cls._validate_response(url_response, "predict")


I have concerns about the validate response method.

It does not validate anything if status code is 200.

It just drops JSON decode errors.

Seems like the version checking should be a middle-ware and not called for every api request.

Silently drops a lot of exceptions.

Jabb0 · 2025-05-07T09:40:22Z

tabpfn_client/client.py

+                cls._validate_response(url_response, "predict")
+                url_response = url_response.json()
+                # Upload test set to GCS
+                response = cls.httpx_client.put(


uhm is there not a GCS SDK method to do this? Maybe this can handle the upload more efficiently?

Jabb0 · 2025-05-07T09:42:17Z

tabpfn_client/client.py

@@ -395,9 +438,11 @@ def run_progress():
        return result

    @classmethod
-    def _make_prediction_request(cls, test_set_uid, x_test_serialized, params):
+    def _make_prediction_request(
+        cls, test_set_uid, x_test_serialized, params, num_cells


missing annotations.

Jabb0 · 2025-05-07T09:42:27Z

tabpfn_client/client.py

        """
-        Helper function to make the prediction request to the server.
+        Helper function to upload test set if required and make the prediction request to the server.


missing parameter documentation.

Jabb0 · 2025-05-07T09:44:29Z

tabpfn_client/client.py

@@ -269,6 +310,8 @@ def predict(

        x_test_serialized = common_utils.serialize_to_csv_formatted_bytes(x_test)

+        num_cells = x_test.shape[0] * (x_test.shape[1] + 1)


why the +1?

Jabb0 · 2025-05-07T09:46:49Z

tabpfn_client/client.py

@@ -395,9 +438,11 @@ def run_progress():
        return result

    @classmethod
-    def _make_prediction_request(cls, test_set_uid, x_test_serialized, params):
+    def _make_prediction_request(
+        cls, test_set_uid, x_test_serialized, params, num_cells


this setup is not ideal.
There is now code duplication for uploading datasets.

It would be better if uploading is independent of predict. So the user can:

upload data set (untyped if train or test)

call fit or predict on an arbitrary dataset.

Jabb0 · 2025-05-07T09:47:43Z

tabpfn_client/client.py

@@ -22,7 +22,7 @@
 from tqdm import tqdm

 from tabpfn_client.tabpfn_common_utils import utils as common_utils
-from tabpfn_client.constants import CACHE_DIR
+from tabpfn_client.constants import CACHE_DIR, LARGE_DATASET_THRESHOLD
 from tabpfn_client.browser_auth import BrowserAuthHandler
 from tabpfn_client.tabpfn_common_utils.utils import Singleton


The new functionality is completely untested.
Add integration test cases that can be run against a real backend to ensure everything works.

Please add instructions how to test this.

davidotte added 4 commits April 9, 2025 15:56

Upload datasets to GCS

d0ca019

Decide dataset upload method based on the dataset size

885439d

Fix dataset upload

12e08f4

Fix small mistakes

b664ffa

davidotte requested review from Jabb0 and noahho May 1, 2025 14:10

Jabb0 requested changes May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload to gcs #108

Upload to gcs #108

davidotte commented May 1, 2025

CLAassistant commented May 1, 2025 •

edited

Loading

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025

Jabb0 May 7, 2025 •

edited

Loading

		@@ -269,6 +310,8 @@ def predict(

		x_test_serialized = common_utils.serialize_to_csv_formatted_bytes(x_test)

		num_cells = x_test.shape[0] * (x_test.shape[1] + 1)

Upload to gcs #108

Are you sure you want to change the base?

Upload to gcs #108

Conversation

davidotte commented May 1, 2025

Change Description

Breaking changes

CLAassistant commented May 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jabb0 May 7, 2025 • edited Loading

Choose a reason for hiding this comment

CLAassistant commented May 1, 2025 •

edited

Loading

Jabb0 May 7, 2025 •

edited

Loading