Skip to content

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #55382

…y evicted after compaction, causing query failures (#55382)

### What problem does this PR solve?
Problem Summary:

1. Problem background
`There is a critical bug in Doris's compaction: after input rowsets
participate in compaction, their expiration time calculation incorrectly
uses the rowset's creation time (creation_time), instead of the
compaction completion time`

2. Scene
for example:
a. After compaction is completed, the rowset should be discarded after
another tablet_rowset_stale_sweep_time_sec
b. Due to the use of creation time calculation, rowset is immediately
eliminated
c. The executing query failed, error occurred : [E-230]fail to find path
in version_graph. spec_version: 0-1789 versions are already compacted

3. Cause
a. In the current implementation, TimestampedVersion is created using
rs->creation_time()
b. Elimination judgment logic : `rowset_creation_time <= (current_time -
tablet_rowset_stale_sweep_time_sec)`
c. For earlier created rowsets, even if they have just participated in
compaction, they will be immediately discarded due to their long
creation time

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@github-actions github-actions bot requested a review from morrySnow as a code owner September 12, 2025 08:27
@Thearas
Copy link
Contributor

Thearas commented Sep 12, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Sep 12, 2025
@Thearas
Copy link
Contributor

Thearas commented Sep 12, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32784 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9e53b2069a7b3c676e79660c3bd27712a74d8021, data reload: false

------ Round 1 ----------------------------------
q1	17619	5543	5375	5375
q2	2025	402	292	292
q3	11933	1259	761	761
q4	10217	881	463	463
q5	7682	2455	2177	2177
q6	189	172	133	133
q7	899	769	614	614
q8	9334	1470	1174	1174
q9	5253	5037	4932	4932
q10	6774	2262	1802	1802
q11	476	294	270	270
q12	327	374	214	214
q13	17760	3650	3065	3065
q14	226	236	206	206
q15	543	460	457	457
q16	414	430	383	383
q17	614	889	378	378
q18	7167	6347	6547	6347
q19	1204	954	550	550
q20	334	342	212	212
q21	3061	2200	2005	2005
q22	1042	1005	974	974
Total cold run time: 105093 ms
Total hot run time: 32784 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5514	5549	5526	5526
q2	238	319	231	231
q3	2295	2651	2362	2362
q4	1404	1795	1359	1359
q5	4421	4997	4973	4973
q6	170	168	132	132
q7	2111	2019	1827	1827
q8	2885	2823	2761	2761
q9	7311	7304	7294	7294
q10	3071	3305	2724	2724
q11	586	514	485	485
q12	689	768	635	635
q13	3381	3847	3208	3208
q14	280	291	266	266
q15	521	486	467	467
q16	440	495	427	427
q17	1254	1789	1248	1248
q18	7744	7382	7402	7382
q19	794	1195	1066	1066
q20	2023	2061	1880	1880
q21	5512	4975	4654	4654
q22	1113	1063	999	999
Total cold run time: 53757 ms
Total hot run time: 51906 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193192 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9e53b2069a7b3c676e79660c3bd27712a74d8021, data reload: false

query1	949	407	424	407
query2	6172	2027	2066	2027
query3	8685	192	198	192
query4	33636	23648	23482	23482
query5	3744	622	456	456
query6	314	203	189	189
query7	4199	502	324	324
query8	317	265	255	255
query9	9400	2640	2625	2625
query10	465	322	261	261
query11	18423	15594	15170	15170
query12	163	111	108	108
query13	1558	549	435	435
query14	10392	7069	6703	6703
query15	231	191	181	181
query16	8090	687	510	510
query17	1581	800	575	575
query18	2107	411	307	307
query19	222	184	170	170
query20	126	122	124	122
query21	207	127	104	104
query22	4539	4689	4613	4613
query23	35866	34259	33752	33752
query24	7369	2732	2730	2730
query25	468	470	434	434
query26	1040	250	176	176
query27	2045	486	362	362
query28	5528	2277	2234	2234
query29	613	551	479	479
query30	244	199	162	162
query31	1018	968	840	840
query32	84	61	59	59
query33	550	397	346	346
query34	762	888	561	561
query35	800	822	740	740
query36	1053	1059	958	958
query37	116	99	71	71
query38	4019	4011	4069	4011
query39	1551	1489	1488	1488
query40	210	135	110	110
query41	53	57	52	52
query42	123	108	109	108
query43	528	517	491	491
query44	1349	857	852	852
query45	187	180	183	180
query46	894	1081	690	690
query47	1957	1976	1933	1933
query48	420	426	353	353
query49	796	534	432	432
query50	685	712	437	437
query51	7374	7473	7227	7227
query52	111	104	96	96
query53	232	263	197	197
query54	575	569	494	494
query55	80	80	82	80
query56	281	285	266	266
query57	1214	1291	1224	1224
query58	241	219	224	219
query59	3174	3440	3198	3198
query60	293	298	275	275
query61	116	113	136	113
query62	786	756	707	707
query63	247	199	199	199
query64	4671	1026	657	657
query65	3366	3323	3331	3323
query66	1093	428	314	314
query67	16458	15853	15836	15836
query68	7986	832	566	566
query69	491	308	265	265
query70	1203	1113	1167	1113
query71	402	298	270	270
query72	5097	3897	3745	3745
query73	636	747	355	355
query74	10371	9408	9337	9337
query75	3240	3135	2720	2720
query76	3198	1128	774	774
query77	703	383	274	274
query78	10424	10371	9586	9586
query79	3717	847	593	593
query80	715	534	440	440
query81	494	265	217	217
query82	619	120	88	88
query83	169	165	149	149
query84	246	93	82	82
query85	784	350	292	292
query86	389	319	292	292
query87	4337	4346	4262	4262
query88	5220	2537	2388	2388
query89	414	346	296	296
query90	1839	194	201	194
query91	137	143	112	112
query92	62	57	53	53
query93	2547	933	564	564
query94	678	391	306	306
query95	346	285	282	282
query96	493	603	281	281
query97	3173	3318	3150	3150
query98	228	222	202	202
query99	1328	1416	1262	1262
Total cold run time: 297846 ms
Total hot run time: 193192 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.43 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9e53b2069a7b3c676e79660c3bd27712a74d8021, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.07
query4	1.62	0.10	0.10
query5	0.54	0.50	0.53
query6	1.12	0.74	0.72
query7	0.02	0.02	0.01
query8	0.05	0.03	0.03
query9	0.60	0.51	0.50
query10	0.57	0.55	0.55
query11	0.14	0.12	0.11
query12	0.14	0.10	0.10
query13	0.62	0.60	0.59
query14	0.78	0.83	0.83
query15	0.83	0.81	0.85
query16	0.37	0.41	0.40
query17	1.05	1.05	1.02
query18	0.25	0.23	0.23
query19	1.99	1.85	1.94
query20	0.02	0.01	0.02
query21	15.36	0.91	0.57
query22	0.75	0.77	0.76
query23	15.07	1.49	0.52
query24	3.78	0.38	1.19
query25	0.22	0.09	0.08
query26	0.31	0.15	0.13
query27	0.05	0.04	0.05
query28	13.56	1.02	0.43
query29	12.57	3.92	3.24
query30	0.26	0.11	0.06
query31	2.81	0.59	0.39
query32	3.23	0.53	0.47
query33	3.02	3.11	3.01
query34	16.71	5.23	4.52
query35	4.55	4.55	4.56
query36	0.64	0.50	0.47
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.03	0.04	0.02
Total cold run time: 104.42 s
Total hot run time: 28.43 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 60.00% (12/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.55% (12765/28026)
Line Coverage 36.37% (113781/312844)
Region Coverage 34.01% (65084/191391)
Branch Coverage 31.03% (34149/110056)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 90.00% (18/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 75.19% (20726/27564)
Line Coverage 68.26% (212869/311873)
Region Coverage 66.26% (127334/192168)
Branch Coverage 59.66% (65963/110570)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 89.47% (17/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 75.20% (20728/27564)
Line Coverage 68.27% (212931/311873)
Region Coverage 66.27% (127358/192168)
Branch Coverage 59.68% (65990/110570)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 89.47% (17/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 75.28% (20749/27564)
Line Coverage 68.41% (213350/311873)
Region Coverage 66.42% (127637/192168)
Branch Coverage 59.83% (66157/110570)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 89.47% (17/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 75.28% (20749/27564)
Line Coverage 68.41% (213350/311873)
Region Coverage 66.42% (127637/192168)
Branch Coverage 59.83% (66157/110570)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants