欢迎使用 verl 文档！
================================================

verl 是一个灵活、高效且生产就绪的 RL 训练框架，专为大型语言模型（LLMs）的后训练而设计。它是 `HybridFlow <https://arxiv.org/pdf/2409.19256>`_ 论文的开源实现。

verl 具有灵活性和易用性：

- **轻松扩展多样化的 RL 算法**：混合编程模型结合了单控制器和多控制器范式的优势，能够灵活表示并高效执行复杂的后训练数据流。允许用户用几行代码构建 RL 数据流。

- **通过模块化 API 无缝集成现有 LLM 基础设施**：解耦计算和数据依赖，使框架能够无缝集成现有 LLM 框架，如 PyTorch FSDP、Megatron-LM、vLLM 和 SGLang。此外，用户可以轻松扩展到其他 LLM 训练和推理框架。

- **灵活的设备映射和并行性**：支持将模型放置在不同 GPU 集上，以实现高效的资源利用和跨不同集群规模的可扩展性。

- 随时与热门的 HuggingFace 模型集成


verl 非常快速：

- **最先进的吞吐量**：通过无缝集成现有的最先进 LLM 训练和推理框架，verl 实现了高生成和训练吞吐量。

- **使用 3D-HybridEngine 的高效 actor 模型重分片**：消除内存冗余，并在训练和生成阶段切换期间显著减少通信开销。

--------------------------------------------

.. _Contents:

.. toctree::
   :maxdepth: 2
   :caption: 快速入门

   start/install
   start/quickstart
   start/multinode
   start/ray_debug_tutorial
   start/more_resources
   start/agentic_rl

.. toctree::
   :maxdepth: 2
   :caption: 编程指南

   hybrid_flow
   single_controller

.. toctree::
   :maxdepth: 1
   :caption: 数据准备

   preparation/prepare_data
   preparation/reward_function

.. toctree::
   :maxdepth: 2
   :caption: 配置

   examples/config

.. toctree::
   :maxdepth: 1
   :caption: PPO 示例

   examples/ppo_code_architecture
   examples/gsm8k_example
   examples/multi_modal_example
   examples/skypilot_examples

.. toctree::
   :maxdepth: 1
   :caption: 算法

   algo/ppo.md
   algo/grpo.md
   algo/collabllm.md
   algo/dapo.md
   algo/spin.md
   algo/sppo.md
   algo/entropy.md
   algo/opo.md
   algo/baseline.md
   algo/gpg.md
   algo/rollout_corr.md
   algo/rollout_corr_math.md

.. toctree::
   :maxdepth: 1
   :caption: PPO 训练器和 Worker

   workers/ray_trainer
   workers/fsdp_workers
   workers/megatron_workers
   workers/sglang_worker
   workers/model_engine

.. toctree::
   :maxdepth: 1
   :caption: 性能调优指南

   perf/dpsk.md
   perf/best_practices
   perf/perf_tuning
   README_vllm0.8.md
   perf/device_tuning
   perf/verl_profiler_system.md
   perf/nsight_profiling.md

.. toctree::
   :maxdepth: 1
   :caption: 添加新模型

   advance/fsdp_extension
   advance/megatron_extension

.. toctree::
   :maxdepth: 1
   :caption: 高级功能

   advance/checkpoint
   advance/rope
   advance/attention_implementation
   advance/ppo_lora.rst
   sglang_multiturn/multiturn.rst
   sglang_multiturn/interaction_system.rst
   advance/placement
   advance/dpo_extension
   examples/sandbox_fusion_example
   advance/rollout_trace.rst
   advance/rollout_skip.rst
   advance/one_step_off
   advance/agent_loop
   advance/reward_loop
   advance/fully_async
   data/transfer_queue.md
   advance/grafana_prometheus.md
   advance/fp8.md

.. toctree::
   :maxdepth: 1
   :caption: 硬件支持

   amd_tutorial/amd_build_dockerfile_page.rst
   amd_tutorial/amd_vllm_page.rst
   ascend_tutorial/ascend_quick_start.rst
   ascend_tutorial/ascend_consistency.rst
   ascend_tutorial/ascend_profiling_zh.rst
   ascend_tutorial/ascend_profiling_en.rst
   ascend_tutorial/dockerfile_build_guidance.rst
   ascend_tutorial/ascend_sglang_quick_start.rst

.. toctree::
   :maxdepth: 1
   :caption: API 参考

   api/data
   api/single_controller.rst
   api/trainer.rst
   api/utils.rst


.. toctree::
   :maxdepth: 2
   :caption: 常见问题

   faq/faq

.. toctree::
   :maxdepth: 1
   :caption: 开发说明

   sglang_multiturn/sandbox_fusion.rst

贡献
-------------

verl 是自由软件，你可以根据 Apache License 2.0 的条款重新分发和修改它。我们欢迎贡献。请加入我们的 `GitHub <https://github.com/volcengine/verl>`_、`Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ 和 `Wechat <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ 进行讨论。

社区贡献者欢迎！请查看我们的 `项目路线图 <https://github.com/volcengine/verl/issues/710>`_ 和 `新手友好问题 <https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22>`_ 了解您可以贡献的地方。

代码检查和格式化
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

我们使用 pre-commit 来帮助提升代码质量。要初始化 pre-commit，请运行：

.. code-block:: bash

   pip install pre-commit
   pre-commit install

要本地解决 CI 错误，您也可以手动运行 pre-commit：

.. code-block:: bash

   pre-commit run

添加 CI 测试
^^^^^^^^^^^^^^^^^^^^^^^^

如果可能，请为您的新功能添加 CI 测试：

1. 找到最相关的 workflow yml 文件，该文件通常对应一个 ``hydra`` 默认配置（例如 ``ppo_trainer``、``ppo_megatron_trainer``、``sft_trainer`` 等）。
2. 如果尚未包含，请将相关路径模式添加到 ``paths`` 部分。
3. 最小化测试脚本的工作负载（请查看现有脚本作为示例）。

我们正在招聘！如果您对 MLSys/LLM 推理/多模态对齐领域的实习/全职机会感兴趣，请发送 `电子邮件 <mailto:haibin.lin@bytedance.com>`_ 给我们。