Inclusion of tsp flag leads to incorrect DVT output results #1368

karcot1 · 2024-12-05T17:38:50Z

Overview: the -tsp flag was implemented as an optional flag due to a bug in teradatasql that adds extra whitespaces to CHAR data types, resulting in incorrect DVT results.

Example: ‘ABC ’ CHAR(10) in TD and “ABC‘ in BQ will become identical when DVT executes with -tsp

We understand this raises a risk of duplicate primary keys in the instance where a table contains two different CHAR or VARCHAR PKs, which only differ by trailing spaces.

Observation: -tsp flag applied to CHAR/VARCHAR data results in incorrect DVT output results, even after accounting for the expected duplicate rows. We see identical hashes flagged as "fail" and different hashes flagged as "success"

sundar-mudupalli-work · 2025-01-06T13:37:36Z

Hi Kash,

Row validation is only possible when rows are unique. When rows are not unique, the source/target comparison will result in apparently incorrect output. In a separate conversation you mentioned that the underlying issue was that the -tsp flag indiscriminately truncates both char and varchar fields. If the flag only truncated char fields, things would work for your customer. Can you elaborate the specific use case(s) ? Does the customer have a composite primary key where one field is a char and another is varchar ?

Thanks.

Sundar Mudupalli

sundar-mudupalli-work · 2025-01-07T02:07:23Z

Hi,

I have created a new branch issue-1368-fixed-length-pks which only trims pk columns that are fixed length characters. I did a test between Teradata and BQ with the following tables containing the same data. Teradata tables:

CREATE SET TABLE udfs.issue1368, FALLBACK (
    ID INTEGER NOT NULL,
    PR_KEY CHAR(30) NOT NULL,
    PR_VARIABLE VARCHAR(30) NOT NULL,
    MY_VALUE CHAR(12) );
CREATE SET TABLE udfs.issue1368_1, FALLBACK (
    ID INTEGER NOT NULL,
    PR_KEY VARCHAR(30) NOT NULL,
    PR_VARIABLE VARCHAR(30) NOT NULL,
    MY_VALUE CHAR(12) );

BQ table:

CREATE TABLE data_validator_issues.issue1368 (
  ID INT64,
  PR_KEY STRING(30),
  PR_VARIABLE STRING(30),
  MY_VALUE STRING
);

The following commands ran successfully.

data-validation validate row -sc teradata -tc bq -pk=pr_key -tbls=udfs.issue1368=data_validator_issues.issue1368 -hash='*'
data-validation validate row -sc teradata -tc bq -pk=pr_key -tbls=udfs.issue1368_1=data_validator_issues.issue1368 -hash='*'

Please test this version with the customer and let me know how this works out.

Sundar Mudupalli

helensilva14 added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Dec 11, 2024

nj1973 mentioned this issue Mar 11, 2025

Require framework for validations to access raw metadata #1468

Closed

This was referenced Mar 13, 2025

Random rows does not work when primary key is a fixed char column #1471

Open

fix: Only strip primary keys when they are fixed chars, not for varchars #1472

Open

helensilva14 removed the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inclusion of tsp flag leads to incorrect DVT output results #1368

Inclusion of tsp flag leads to incorrect DVT output results #1368

karcot1 commented Dec 5, 2024

sundar-mudupalli-work commented Jan 6, 2025

sundar-mudupalli-work commented Jan 7, 2025

Inclusion of tsp flag leads to incorrect DVT output results #1368

Inclusion of tsp flag leads to incorrect DVT output results #1368

Comments

karcot1 commented Dec 5, 2024

sundar-mudupalli-work commented Jan 6, 2025

sundar-mudupalli-work commented Jan 7, 2025