Skip to content

Commit 686a8f7

Browse files
committed
Merge branch 'develop' of github.com:PaddlePaddle/PaddleNLP into Pr_adapt_flex_checkpoint
2 parents 6091951 + efd7d26 commit 686a8f7

File tree

44 files changed

+1775
-290
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1775
-290
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: "Check bypass"
2+
description: "A custom action to encapsulate PFCCLab/ci-bypass"
3+
inputs:
4+
github-token:
5+
description: "GitHub token"
6+
required: true
7+
workflow-name:
8+
description: "Workflow name"
9+
required: true
10+
outputs:
11+
can-skip:
12+
description: "Whether the workflow can be skipped."
13+
value: ${{ steps.check-bypass.outputs.can-skip }}
14+
15+
runs:
16+
using: "composite"
17+
steps:
18+
- id: check-bypass
19+
name: Check Bypass
20+
env:
21+
CI_TEAM_MEMBERS: '["tianshuo78520a", "swgu98", "risemeup1", "XieYunshen","luotao1","From00"]'
22+
uses: PFCCLab/ci-bypass@v1
23+
with:
24+
github-token: ${{ inputs.github-token }}
25+
non-pull-request-event-strategy: 'never-skipped'
26+
type: 'composite'
27+
composite-rule: |
28+
{
29+
"any": [
30+
{
31+
"type": "labeled",
32+
"label": ["skip-ci: ${{ inputs.workflow-name }}", "skip-ci: all"],
33+
"username": ${{ env.CI_TEAM_MEMBERS }}
34+
},
35+
{
36+
"type": "commented",
37+
"comment-pattern": [".*/skip-ci ${{ inputs.workflow-name }}.*", ".*/skip-ci all.*"],
38+
"username": ${{ env.CI_TEAM_MEMBERS }}
39+
}
40+
]
41+
}

.github/workflows/approval.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,14 @@ jobs:
3333
git checkout -b test_pr upstream/${BRANCH}
3434
git merge --no-edit origin_pr
3535
git log --pretty=oneline -10
36+
37+
- name: Check bypass
38+
id: check-bypass
39+
uses: ./.github/actions/check-bypass
40+
with:
41+
github-token: ${{ secrets.GITHUB_TOKEN }}
42+
workflow-name: approval
43+
3644
- name: Display Required Approvers
3745
if: steps.check-bypass.outputs.can-skip != 'true'
3846
run: |

.github/workflows/check-bypass.yml

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
on:
2+
workflow_call:
3+
inputs:
4+
workflow-name:
5+
required: true
6+
type: string
7+
secrets:
8+
github-token:
9+
required: true
10+
outputs:
11+
can-skip:
12+
description: "Whether the workflow can be skipped."
13+
value: ${{ jobs.check-bypass.outputs.can-skip }}
14+
15+
jobs:
16+
check-bypass:
17+
name: Check bypass
18+
runs-on:
19+
group: APPROVAL
20+
permissions:
21+
contents: read
22+
env:
23+
CI_TEAM_MEMBERS: '["tianshuo78520a", "swgu98", "risemeup1" , "XieYunshen","luotao1","From00"]'
24+
outputs:
25+
can-skip: ${{ steps.check-bypass.outputs.can-skip }}
26+
steps:
27+
- name: Cleanup
28+
run: |
29+
rm -rf * .[^.]*
30+
31+
- id: check-bypass
32+
name: Check Bypass
33+
uses: PFCCLab/ci-bypass@v1
34+
with:
35+
github-token: ${{ secrets.GITHUB_TOKEN }}
36+
non-pull-request-event-strategy: 'never-skipped'
37+
type: 'composite'
38+
composite-rule: |
39+
{
40+
"any": [
41+
{
42+
"type": "labeled",
43+
"label": ["skip-ci: ${{ inputs.workflow-name }}", "skip-ci: all"],
44+
"username": ${{ env.CI_TEAM_MEMBERS }}
45+
},
46+
{
47+
"type": "commented",
48+
"comment-pattern": [".*/skip-ci ${{ inputs.workflow-name }}.*", ".*/skip-ci all.*"],
49+
"username": ${{ env.CI_TEAM_MEMBERS }}
50+
}
51+
]
52+
}

.github/workflows/distribute-a100.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,18 @@ defaults:
3737
shell: bash
3838

3939
jobs:
40+
check-bypass:
41+
name: Check bypass
42+
uses: ./.github/workflows/check-bypass.yml
43+
with:
44+
workflow-name: 'distribute-a100'
45+
secrets:
46+
github-token: ${{ secrets.GITHUB_TOKEN }}
47+
4048
distribute-a100-ci:
4149
name: distribute-a100-ci
50+
needs: check-bypass
51+
if: ${{ needs.check-bypass.outputs.can-skip != 'true' }}
4252
runs-on:
4353
group: Distribute
4454
steps:

.github/workflows/distribute-v100.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,18 @@ defaults:
3737
shell: bash
3838

3939
jobs:
40+
check-bypass:
41+
name: Check bypass
42+
uses: ./.github/workflows/check-bypass.yml
43+
with:
44+
workflow-name: 'distribute-v100'
45+
secrets:
46+
github-token: ${{ secrets.GITHUB_TOKEN }}
47+
4048
distribute-v100-ci:
4149
name: distribute-v100-ci
50+
needs: check-bypass
51+
if: ${{ needs.check-bypass.outputs.can-skip != 'true' }}
4252
runs-on:
4353
group: Auto-Parallel
4454
steps:

.github/workflows/lint.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,18 @@ env:
1313
TASK: PaddleNLP-CI-Lint-${{ github.event.pull_request.number }}
1414

1515
jobs:
16+
check-bypass:
17+
name: Check bypass
18+
uses: ./.github/workflows/check-bypass.yml
19+
with:
20+
workflow-name: 'lint'
21+
secrets:
22+
github-token: ${{ secrets.GITHUB_TOKEN }}
23+
1624
Lint:
1725
name: Lint
26+
needs: check-bypass
27+
if: ${{ needs.check-bypass.outputs.can-skip != 'true' }}
1828
runs-on: [self-hosted, ernie-cpu]
1929
steps:
2030
- name: Run Container

llm/docs/finetune.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.
9494
3. 可以通过设置`weight_quantize_algo`将主干模型量化低比特,例如'weight_only_int4','weight_only_int8','nf4'或'fp4'。具体参考精调参数介绍
9595
4. 设置`use_flash_attention`为 True 使用 FlashAttention。在 FlashAttention 打开的基础上设置`flash_mask`为 True 使用 FlashMask。
9696
5. LoRA API 支持4D 并行策略,可以通过控制`tensor_parallel_degree``pipeline_parallel_degree``sharding``sharding_parallel_degree`调整并行训练策略,可拓展至**单机 LoRA 微调千亿模型**
97-
6. 可配置`rslora``lora_plus_scale``pissa``lora_use_mixer``use_mora`等参数,使用 rsLoRA、LoRa+、PiSSA、MosLoRA(暂不支持张量模型并行)、MoRA(暂不支持张量模型并行) 等算法。
97+
6. 可配置`rslora``lora_plus_scale``pissa``lora_use_mixer``mixer_num``use_mora`等参数,使用 rsLoRA、LoRa+、PiSSA、MosLoRA(暂不支持张量模型并行)、LinChain(暂不支持张量模型并行)、MoRA(暂不支持张量模型并行) 等算法。
9898

9999
为了后续的**压缩****静态图推理**方便,我们提供 LoRA 参数合并脚本,可以将 LoRA 参数合并到主干模型并保存相应的权重。
100100
```

llm/run_finetune.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,12 @@ def create_peft_model(model_args, reft_args, training_args, dtype, model_config,
580580
use_quick_lora=model_args.use_quick_lora,
581581
lora_use_mixer=model_args.lora_use_mixer,
582582
use_mora=model_args.use_mora,
583+
<<<<<<< HEAD
584+
nola=model_args.nola,
585+
nola_basis_num=model_args.nola_basis_num,
586+
=======
587+
mixer_num=model_args.mixer_num,
588+
>>>>>>> upstream/develop
583589
lorapro=model_args.lorapro,
584590
)
585591
if model_args.lorapro:

llm/tools/merge_lora_params.py

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,51 +78,68 @@ def weight_process(name, quant_config, lora_config, state_dict, device):
7878
raise ValueError(f"quant_config.weight_quantize_algo {quant_config.weight_quantize_algo} is not supported.")
7979

8080

81+
def get_mixer(mixer, mixer_num, index=0):
82+
if index == mixer_num - 1:
83+
return mixer[index]
84+
else:
85+
return mixer[index] @ get_mixer(mixer, mixer_num, index + 1)
86+
87+
8188
def lora_process(name, layer, lora_config, state_dict, device, lora_state_dict=None):
89+
8290
target_device = device if device == "cpu" else device + ":0"
8391

8492
if (name + ".weight") not in state_dict.keys():
8593
return
8694

8795
weight = state_dict.pop(name + ".weight")
8896
lora_use_mixer = lora_config.lora_use_mixer
97+
98+
mixer_num = lora_config.mixer_num
99+
mixer = {}
89100
use_mora = lora_config.use_mora
101+
90102
if lora_state_dict is None:
91103
lora_A = state_dict.pop(name + ".lora_A")
92104
if not use_mora:
93105
lora_B = state_dict.pop(name + ".lora_B")
94106
if lora_use_mixer:
95-
lora_AB = state_dict.pop(name + ".lora_AB")
107+
for i in range(mixer_num):
108+
mixer[i] = state_dict.pop(name + ".lora_mixer_" + str(i))
96109
else:
97110
lora_A = lora_state_dict.pop(name + ".lora_A")
98111
if not use_mora:
99112
lora_B = lora_state_dict.pop(name + ".lora_B")
100113
if lora_use_mixer:
101-
lora_AB = lora_state_dict.pop(name + ".lora_AB")
114+
for i in range(mixer_num):
115+
mixer[i] = state_dict.pop(name + ".lora_mixer_" + str(i))
102116
if device != "cpu":
103117
weight = weight.to(target_device)
104118
lora_A = lora_A.to(target_device)
105119
if not use_mora:
106120
lora_B = lora_B.to(target_device)
107121
if lora_use_mixer:
108-
lora_AB = lora_AB.to(target_device)
122+
for key in mixer.keys():
123+
mixer[key] = mixer[key].to(target_device)
109124

110125
if device == "cpu" and weight.dtype.name == "BF16":
111126
weight = weight.astype("float32")
112127
lora_A = lora_A.astype("float32")
113128
if not use_mora:
114129
lora_B = lora_B.astype("float32")
130+
115131
if lora_use_mixer:
116-
lora_AB = lora_AB.astype(lora_config.dtype)
117-
delta_weight = layer.get_delta_weight(lora_A, lora_B, lora_AB)
132+
for key in mixer.keys():
133+
mixer[key] = mixer[key].astype(lora_config.dtype)
134+
delta_weight = layer.get_delta_weight(lora_A, lora_B, get_mixer(mixer, mixer_num))
118135
elif use_mora:
119136
delta_weight = layer.get_delta_weight(lora_A)
120137
else:
121138
delta_weight = layer.get_delta_weight(lora_A, lora_B)
122139
out = (weight + delta_weight).astype(lora_config.dtype)
123140
else:
124141
if lora_use_mixer:
125-
delta_weight = layer.get_delta_weight(lora_A, lora_B, lora_AB)
142+
delta_weight = layer.get_delta_weight(lora_A, lora_B, get_mixer(mixer, mixer_num))
126143
elif use_mora:
127144
delta_weight = layer.get_delta_weight(lora_A)
128145
else:

ops/csrc/setup.py

Lines changed: 6 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -53,20 +53,6 @@ def run_single(func):
5353
p.join()
5454

5555

56-
def run_multi(func_list):
57-
processes = []
58-
for func in func_list:
59-
processes.append(multiprocessing.Process(target=func))
60-
processes.append(multiprocessing.Process(target=func))
61-
processes.append(multiprocessing.Process(target=func))
62-
63-
for p in processes:
64-
p.start()
65-
66-
for p in processes:
67-
p.join()
68-
69-
7056
cc_flag = get_gencode_flags(compiled_all=False)
7157
cc = get_sm_version()
7258

@@ -251,17 +237,14 @@ def setup_paddle_bwd_ops():
251237
ext_modules=CUDAExtension(
252238
include_dirs=paddle_includes,
253239
sources=sources,
240+
extra_compile_args={}
254241
),
255242
)
256243

257244

258245
if __name__ == "__main__":
259-
run_multi(
260-
[
261-
setup_fast_ln,
262-
setup_fused_ln,
263-
setup_causal_conv1d,
264-
setup_selective_scan,
265-
setup_paddle_bwd_ops,
266-
],
267-
)
246+
setup_fast_ln()
247+
setup_fused_ln()
248+
setup_causal_conv1d()
249+
setup_selective_scan()
250+
setup_paddle_bwd_ops()

0 commit comments

Comments
 (0)