昇腾快速开始 =================================== 最后更新:11/14/2025。 我们已经在 verl 中添加了对华为昇腾设备的支持。 硬件支持 ----------------------------------- Atlas 200T A2 Box16 Atlas 900 A2 PODc Atlas 800T A3 安装 ----------------------------------- 基础环境准备 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-----------+--------------------------+ | software | version | +-----------+--------------------------+ | Python | >= 3.10, <3.12 | +-----------+--------------------------+ | CANN | == 8.3.RC1 | +-----------+--------------------------+ | torch | == 2.7.1 | +-----------+--------------------------+ | torch_npu | == 2.7.1 | +-----------+--------------------------+ 基础环境准备请参照这份 `文档 `_ 。 vllm & vllm-ascend ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 为了能够在 verl 中正常使用 vllm,需使用以下命令编译安装 vllm 和 vllm-ascend。请注意根据机器类型区分安装方式。 .. code-block:: bash # vllm git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git cd vllm # for Atlas 200T A2 Box16 VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/ # for Atlas 900 A2 PODc or Atlas 800T A3 VLLM_TARGET_DEVICE=empty pip install -e . .. code-block:: bash # vllm-ascend git clone -b v0.11.0rc1 --depth 1 https://github.com/vllm-project/vllm-ascend.git cd vllm-ascend pip install -e . 安装 verl ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash git clone https://github.com/volcengine/verl.git cd verl pip install -r requirements-npu.txt pip install -e . DockerFile 镜像构建 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 如需要通过 Dockerfile 构建镜像,请参考 `文档 `_ 。 其他三方库说明 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------+----------------+ | software | description | +---------------+----------------+ | transformers | >=v4.57.1 | +---------------+----------------+ | triton-ascend | 3.2.0rc4 | +---------------+----------------+ | flash_attn | not supported | +---------------+----------------+ | liger-kernel | not supported | +---------------+----------------+ 1. 支持通过 transformers 启用 --flash_attention_2,transformers 需大于等于 4.57.1 版本。 2. 需要安装 triton-ascend,以确保 vLLM 正常推理。 3. 不支持通过 flash_attn 启用 flash attention 加速。 4. 不支持 liger-kernel 启用。 5. 针对 x86 服务器,需要安装 CPU 版本的 torchvision。 .. code-block:: bash pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu 快速开始 ----------------------------------- 正式使用前,建议您通过对 Qwen2.5-0.5B GRPO 的训练尝试以检验环境准备和安装的正确性。 1. 下载数据集并将数据集预处理为 parquet 格式,以便包含计算 RL 奖励所需的必要字段 .. code-block:: bash python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k 2. 执行训练 .. code-block:: bash set -x export VLLM_ATTENTION_BACKEND=XFORMERS python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=128 \ data.max_prompt_length=512 \ data.max_response_length=128 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=5e-7 \ actor_rollout_ref.model.use_remove_padding=False \ actor_rollout_ref.actor.entropy_coeff=0.001 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=20 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.critic_warmup=0 \ trainer.logger=console \ trainer.project_name='verl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2_7b_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=5 \ trainer.total_epochs=1 \ trainer.device=npu $@ (可选) 设置 MindSpeed 训练后端指导 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. 参考 `MindSpeed README `_ 说明安装 MindSpeed 加速库。 2. 启用 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。 3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。 4. 更多特性信息可参考 `MindSpeed+verl 文档 `_ 。 支持现状 ----------------------------------- **表 1** RL 类算法 +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | algorithm | model | actor.strategy | rollout.name | hardware | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen2.5-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen2.5-32B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen2.5-VL-3B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen2.5-VL-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen2.5-VL-32B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen3-8B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | GRPO | Qwen3-32B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen2.5-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen2.5-32B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen3-8B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen3-14B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen3-30B-A3B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | DAPO | Qwen3-30B-A3B | megatron | vllm-ascend | Atlas 800T A3 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | PPO | Qwen3-8B | FSDP | vllm-ascend | Atlas 900 A2 PODc | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ | One_Step_Off_Policy | Qwen3-8B | FSDP | vllm-ascend | Atlas 800T A3 | +-----------------------+-------------------------+-------------------+-------------------+--------------------------+ **表 2** SFT 类算法 +-----------+-------------------------+-------------------+----------------------+ | algorithm | model | actor.strategy | hardware | +-----------+-------------------------+-------------------+----------------------+ | SFT-PEFT | Qwen3-8B | FSDP | Atlas 900 A2 PODc | +-----------+-------------------------+-------------------+----------------------+ | ReTool-SFT| Qwen2.5-7B-instruct | FSDP | Atlas 900 A2 PODc | +-----------+-------------------------+-------------------+----------------------+ 计划 ----------------------------------- 查看 `roadmap `_ 获取更多特性的支持进度。 声明 ----------------------------------- verl 中提供的 ascend 支持代码、Dockerfile、镜像皆为参考样例,如在生产环境中使用请通过官方正式途径沟通,谢谢。