-
Notifications
You must be signed in to change notification settings - Fork 0
Implement importing of extracts to database #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
def _create_postgres_db(self): | ||
# create new postgresql cluster | ||
print_stage('Creating PostgreSQL cluster') | ||
db_dir = os.path.join(self.working_dir, 'pg_data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will do unexpected things if one starts two containers to process in parallel with the same dirs, maybe also use the tilecoordinates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
db_dir = os.path.join(self.working_dir, 'pg_data') | ||
subprocess.run(['rm', '-rf', db_dir], check=True) | ||
os.makedirs(db_dir) | ||
subprocess.run(['pg_createcluster', PG_VERSION, 'main', '--start', '--datadir', db_dir], check=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prpably should edit the configuration of the database and allow more RAM use
print_stage("Importing data with osm2pgsql") | ||
subprocess.run([ | ||
'su', 'postgres', '-c', | ||
f'osm2pgsql --slim --hstore-all -C 3000 ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 3000 a good choice for RAM usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what the original script used
f'-P {self.db_port} ' | ||
f'-U postgres ' | ||
f'--number-processes {max(1, os.cpu_count() - 2)} ' | ||
f'{os.path.join(self.working_dir, self.pbf_file_name)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use unchecked tables as we will drop it later on?
with open(file_path, 'wb') as f: | ||
subprocess.run(['su', 'postgres', '-c', | ||
f'pg_dump -p {self.db_port} -d {self.db_dbname} --format custom'], | ||
check=True, cwd=self.out_dir, stdout=f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add compression?
print_stage('Uploading PostgreSQL dump to tileserver-mapping') | ||
file_path = os.path.join(self.working_dir, 'db.pg_dump') | ||
self.api.upload_sql_dump(self.tile, file_path) | ||
subprocess.run(['rsync', file_path, self.out_dir], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is uploaded i would not put it into local storage, it would only waste space
def _upload_dump(self): | ||
print_stage('Uploading PostgreSQL dump to tileserver-mapping') | ||
file_path = os.path.join(self.working_dir, 'db.pg_dump') | ||
self.api.upload_sql_dump(self.tile, file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it retry upload?
This depends on Map-Data/tileserver-mapping#10
I tried reproducing the steps from https://github.com/Map-Data/regiontileserver/blob/master/import_data.sh as much as possible with the addition of coordinating files to tileserver-mapping.