Data collection based on FSDP backend on Ascend devices(基于昇腾设备的 FSDP 后端数据收集)
==========================================================================================

Last updated: 08/14/2025.

这一教程介绍如何在昇腾设备上使用基于 FSDP 的 GRPO 或 DAPO 算法进行数据收集。

Configuration(配置)
-------------

利用两级配置来控制数据收集：

1. **Global profiler control**: 使用 ``ppo_trainer.yaml`` 中的参数来控制收集模式和步骤。
2. **Role profile control**: 使用每个角色的 ``profile`` 字段中的参数来控制该角色的收集模式。

Global collection control(全局收集控制)
~~~~~~~~~~~~~~~~~~~~~~~~~

使用 ``ppo_trainer.yaml`` 中的参数来控制收集模式和步骤。

-  global_profiler: 控制分析的等级和模式

   -  tool: 要使用的分析工具，可选项为 nsys、npu、torch、torch_memory。
   -  steps: 此参数可以设置为一个列表，其中包含收集步骤，例如 [2, 4]，表示将收集步骤 2 和 4。如果设置为 null，则不进行收集。
   -  save_path: 保存收集数据的路径。默认值为 "outputs/profile"。

Role collection control(角色收集控制)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

在每个角色的 ``profiler`` 字段中，您可以控制该角色的收集模式。

-  enable: 是否为此角色启用分析。
-  all_ranks: 是否从所有等级收集数据。
-  ranks: 要收集数据的等级列表。如果为空，则不收集数据。
-  tool_config: 此角色使用的分析工具的配置。

使用每个角色 ``profiler.tool_config.npu`` 中的参数来控制 npu profiler 的行为：

-  level: 收集级别——可选项为 level_none、level0、level1 和 level2

   -  level_none: 禁用所有基于级别的收集（关闭 profiler_level）。
   -  level0: 收集高层应用数据、底层 NPU 数据，以及 NPU 上的算子执行细节。
   -  level1: 在 level0 的基础上，增加 CANN 层 AscendCL 数据和 NPU 上的 AI Core 性能指标。
   -  level2: 在 level1 的基础上，增加 CANN 层 Runtime 数据和 AI CPU 指标。

-  contents: 用于控制收集内容的选项列表，例如 npu、cpu、memory、shapes、module、stack。
   
   -  npu: 是否收集设备侧性能数据。
   -  cpu: 是否收集主机侧性能数据。
   -  memory: 是否启用内存分析。
   -  shapes: 是否记录张量形状。
   -  module: 是否记录框架层 Python 调用栈信息。
   -  stack: 是否记录算子调用栈信息。

-  analysis: 启用自动数据解析。
-  discrete: 是否启用离散模式。

Examples(示例)
--------

Disabling collection(禁用收集)
~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: null # disable profile

End-to-End collection(端到端收集)
~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: [1, 2, 5]
      actor_rollout_ref:
         actor:
            profiler:
               enable: True
               all_ranks: True
               tool_config:
                  npu:
                     discrete: False
        # rollout & ref follow actor settings(rollout 和 ref 遵循 actor 设置)


Discrete Mode Collection(离散模式收集)
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: [1, 2, 5]
      actor_rollout_ref:
         actor:
            profiler:
               enable: True
               all_ranks: True
               tool_config:
                  npu:
                     discrete: True
        # rollout & ref follow actor settings(rollout 和 ref 遵循 actor 设置)


Visualization(可视化)
-------------

收集的数据存储在用户定义的 save_path 中，可以使用 `MindStudio Insight <https://www.hiascend.com/document/detail/zh/mindstudio/80RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html>`_ 工具进行可视化。

如果 analysis 参数设置为 False，则在收集数据后需要进行离线解析：

.. code:: python

    import torch_npu
    # Set profiler_path to the parent directory of the "localhost.localdomain_<PID>_<timestamp>_ascend_pt" folder（将 profiler_path 设置为 "localhost.localdomain_<PID>_<timestamp>_ascend_pt" 文件夹的父目录）
    torch_npu.profiler.profiler.analyse(profiler_path=profiler_path)