Skip to content

Inclusion of tsp flag leads to incorrect DVT output results #1368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
karcot1 opened this issue Dec 5, 2024 · 2 comments · May be fixed by #1472
Open

Inclusion of tsp flag leads to incorrect DVT output results #1368

karcot1 opened this issue Dec 5, 2024 · 2 comments · May be fixed by #1472

Comments

@karcot1
Copy link

karcot1 commented Dec 5, 2024

Overview: the -tsp flag was implemented as an optional flag due to a bug in teradatasql that adds extra whitespaces to CHAR data types, resulting in incorrect DVT results.

Example: ‘ABC ’ CHAR(10) in TD and “ABC‘ in BQ will become identical when DVT executes with -tsp

We understand this raises a risk of duplicate primary keys in the instance where a table contains two different CHAR or VARCHAR PKs, which only differ by trailing spaces.

Observation: -tsp flag applied to CHAR/VARCHAR data results in incorrect DVT output results, even after accounting for the expected duplicate rows. We see identical hashes flagged as "fail" and different hashes flagged as "success"

@helensilva14 helensilva14 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Dec 11, 2024
@sundar-mudupalli-work
Copy link
Collaborator

Hi Kash,

Row validation is only possible when rows are unique. When rows are not unique, the source/target comparison will result in apparently incorrect output. In a separate conversation you mentioned that the underlying issue was that the -tsp flag indiscriminately truncates both char and varchar fields. If the flag only truncated char fields, things would work for your customer. Can you elaborate the specific use case(s) ? Does the customer have a composite primary key where one field is a char and another is varchar ?

Thanks.

Sundar Mudupalli

@sundar-mudupalli-work
Copy link
Collaborator

Hi,

I have created a new branch issue-1368-fixed-length-pks which only trims pk columns that are fixed length characters. I did a test between Teradata and BQ with the following tables containing the same data. Teradata tables:

CREATE SET TABLE udfs.issue1368, FALLBACK (
    ID INTEGER NOT NULL,
    PR_KEY CHAR(30) NOT NULL,
    PR_VARIABLE VARCHAR(30) NOT NULL,
    MY_VALUE CHAR(12) );
CREATE SET TABLE udfs.issue1368_1, FALLBACK (
    ID INTEGER NOT NULL,
    PR_KEY VARCHAR(30) NOT NULL,
    PR_VARIABLE VARCHAR(30) NOT NULL,
    MY_VALUE CHAR(12) );

BQ table:

CREATE TABLE data_validator_issues.issue1368 (
  ID INT64,
  PR_KEY STRING(30),
  PR_VARIABLE STRING(30),
  MY_VALUE STRING
);

The following commands ran successfully.

data-validation validate row -sc teradata -tc bq -pk=pr_key -tbls=udfs.issue1368=data_validator_issues.issue1368 -hash='*'
data-validation validate row -sc teradata -tc bq -pk=pr_key -tbls=udfs.issue1368_1=data_validator_issues.issue1368 -hash='*'

Please test this version with the customer and let me know how this works out.

Sundar Mudupalli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants