- 
                Notifications
    You must be signed in to change notification settings 
- Fork 947
Gcs database #12817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Gcs database #12817
Conversation
        
          
                bin/run-db-download.py
              
                Outdated
          
        
      | REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2") | ||
| S3_BASE_URL = f"https://s3-{REGION_NAME}.amazonaws.com/{BUCKET_NAME}" | ||
| GCS_BASE_URL = f"https://storage.googleapis.com/{BUCKET_NAME}" | ||
| DOWNLOAD_FROM_GCS = os.getenv("DOWNLOAD_FROM_GCS", 'False').lower() in ('true', '1', 't') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our tests don't check black formatting on these files, but it would be nice if they passed (wrt the single vs double quoting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, this line might be a little over the top too. Open to feedback on how you all like to handle boolean values as env variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other parts of the code base we use everett. If we used it here this could be written as DOWNLOAD_FROM_GCS = config("DOWNLOAD_FROM_GCS", parser=bool, default="false"). But not sure why it isn't already used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh cool, thanks @robhudson pushed up a change that pulls that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp, maybe that is why looks to fail the tests:
Traceback (most recent call last):
  File "/app/./bin/run-db-download.py", line 10, in <module>
    from bedrock.base.config_manager import config
ModuleNotFoundError: No module named 'bedrock'| Looks like  | 
| Thanks @robhudson ! | 
        
          
                bin/run-db-download.py
              
                Outdated
          
        
      | from bedrock.base.config_manager import config # noqa | ||
|  | ||
| BUCKET_NAME = os.getenv("AWS_DB_S3_BUCKET", "bedrock-db-dev") | ||
| REGION_NAME = os.getenv("AWS_DB_REGION", "us-west-2") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have config now, we could optionally convert these (or anywhere else in this file where we use os.getenv). An example of a string value would just be:
BUCKET_NAME = config("AWS_DB_S3_BUCKET", default="bedrock-db-dev")
Sorry to tack on so much extra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all!
| Codecov ReportAll modified and coverable lines are covered by tests ✅ 
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main   #12817      +/-   ##
==========================================
- Coverage   76.99%   76.88%   -0.12%     
==========================================
  Files         144      144              
  Lines        7704     7669      -35     
==========================================
- Hits         5932     5896      -36     
- Misses       1772     1773       +1     ☔ View full report in Codecov by Sentry. | 
| s3.upload_file(JSON_DATA_FILE, BUCKET_NAME, JSON_DATA_FILE_NAME, ExtraArgs={"ACL": "public-read"}) | ||
| except Boto3Error: | ||
| return f"ERROR: Failed to upload the new database info file: {db_data}" | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want anything around the if not s3 line above, so that in the future if we remove the AWS credentials this won't short circuit and never make it this far? Or would changes around S3 come later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to do kind of a multi step deploy:
- Get the code out
- Turn on GCS so it uploads to both GCS and S3, let that sit for a while
- Turn on download from GCS, let that sit
- Come back and remove the s3 references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkochendorfer @pmac Can we get this rolled out to prod? Good to do ahead of the postgres/cms tango
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the extra changes. 👍
| @bkochendorfer Do you need one of us to merge this? | 
| Yes I do not have merge capabilities here | 
| Chatted with Brett. We're going to sit on this until there's a bit more time to focus on it soon. | 
| I'm going to close this but feel free to re-open with an updated version :) | 
One-line summary
Allow uploading and downloading from GCP CS (GCS) instead of AWS S3.
Significant changes and points to review
I felt like just throwing an error during the upload if there was a failure is fine but let me know if you want me to try to figure out what to catch.
Issue / Bugzilla link
#12637
Testing
I deployed the change manually in our
devgcp environment and connected to the clock pod to manually run anuploadand adownload.