多模态示例架构
=================================

最后更新: 04/28/2025.

介绍
------------

现在，verl 已支持多模态训练。您可以使用 FSDP 和 VLLM/SGLANG 来启动多模态 RL 任务。Megatron 支持也即将推出。

请按照以下步骤快速启动多模态 RL 任务。

步骤 1: 准备数据集
-----------------------

.. code:: python

    # 它将被保存在 $HOME/data/geo3k 文件夹中
    python examples/data_preprocess/geo3k.py

步骤 2: 下载模型
----------------------

.. code:: bash

    # 从 Hugging Face 下载模型
    python3 -c "import transformers; transformers.pipeline(model='Qwen/Qwen2.5-VL-7B-Instruct')"

步骤 3: 使用 Geo3K 数据集对多模态模型执行 GRPO 训练
---------------------------------------------------------------------

.. code:: bash

    # 运行任务
    bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh