Skip to content

fix: Only strip primary keys when they are fixed chars, not for varchars #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from

Conversation

sundar-mudupalli-work
Copy link
Collaborator

Description of changes

DVT correctly handles character (fixed or varchar) primary keys - specifically for Oracle, Postgres and Teradata. As noted in issue 1470, DVT does not handle fixed length chars primary keys for SQL Server and Db2. Random rows don't work well with fixed length char primary keys.

This change also deprecates the -tsp option - it is removed from the documentation and users get a warning.

Issues to be closed

Closes #1368

Checklist

  • I have followed the CONTRIBUTING Guide.
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated any relevant documentation to reflect my changes, if applicable
  • I have added unit and/or integration tests relevant to my change as needed
  • I have already checked locally that all unit tests and linting are passing (use the tests/local_check.sh script)
  • I have manually executed end-to-end testing (E2E) with the affected databases/engines

Sorry, something went wrong.

@sundar-mudupalli-work
Copy link
Collaborator Author

/gcbrun

Copy link
Contributor

@nj1973 nj1973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions


DVT correctly handles Teradata strings encoded in ISO-8859 (Latin) encoding [PR 1226](https://github.com/GoogleCloudPlatform/professional-services-data-validator/pull/1226)

## Handling NCHAR and NVARCHAR data types in MS SQL and Oracle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use "SQL Server" instead of "MS SQL". We all know what "MS SQL" means but I don't think it is an official product name.

@@ -141,6 +140,31 @@ INSERT INTO `pso_data_validator`.`dvt_char_id` VALUES
('DVT4 ', 'Row 4 '),
('DVT5 ', 'Row 5');

# BigQuery does not have a fixed char data type and blanks at the end of strings are not removed by the database.
# Other databases have a fixed char data type and these tables are to allow the comparison of those columns with BQ and each other
CREATE OR REPLACE TABLE `pso_data_validator`.`dvt_varchar_id`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have test table dvt_string_id. Do we either not need dvt_varchar_id or need to remove dvt_string_id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the tables dvt_string_id and dvt_char_id were confusing - not clear about the purpose of each. So I created new tables with names that are pretty clear. Also the data in those tables assumed you were stripping string primary keys. The data in the dvt_varchar_id includes a row with a primary key with a trailing blank. This would have been stripped in the earlier design. Now it will not.

Yes - maybe in a few months, we can remove the tables dvt_string_id and dvt_char_id from our databases.

Thank you.

Sundar Mudupalli

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've removed the tests that use dvt_string_id so I assumed the new table was a replacement for the old one. If they serve different purposes should the old tests be left in place? If the table is redundant we can remove it from the source code but not actually drop it until your changes reach develop.

@@ -258,6 +258,32 @@ INSERT INTO pso_data_validator.dvt_char_id VALUES ('DVT4', 'Row 4');
INSERT INTO pso_data_validator.dvt_char_id VALUES ('DVT5', 'Row 5');
COMMIT;

DROP TABLE pso_data_validator.dvt_fixed_char_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for BigQuery, the new tables appear to be duplicates of existing tables.

@sundar-mudupalli-work
Copy link
Collaborator Author

/gcbrun

@sundar-mudupalli-work
Copy link
Collaborator Author

/gcbrun

@nehanene15
Copy link
Collaborator

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inclusion of tsp flag leads to incorrect DVT output results

None yet

4 participants