[Docs] Translate model tutorial (#738)

* translate model first time * Update docs/en/tutorials/model.md Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

[Docs] Translate model tutorial (#738)
* translate model first time * Update docs/en/tutorials/model.md Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
af408786 · Mashiro · Zaida Zhou · 7f11b5d1 · af408786 · af408786
Commit af408786 authored 2 years ago by Mashiro Committed by Zaida Zhou 2 years ago
--- a/docs/en/tutorials/model.md
+++ b/docs/en/tutorials/model.md
 # Model
-Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/model.html).
+## Runner and model
+As mentioned in [basic dataflow](./runner.md#bassic-dataflow), the dataflow between DataLoader, model and evaluator follows some rules. Don't remember clearly? Let's review it:
+```python
+# Training process
+for data_batch in train_dataloader:
+    data_batch = model.data_preprocessor(data_batch, training=True)
+    if isinstance(data_batch, dict):
+        losses = model(**data_batch, mode='loss')
+    elif isinstance(data_batch, (list, tuple)):
+        losses = model(*data_batch, mode='loss')
+    else:
+        raise TypeError()
+# Validation process
+for data_batch in val_dataloader:
+    data_batch = model.data_preprocessor(data_batch, training=False)
+    if isinstance(data_batch, dict):
+        outputs = model(**data_batch, mode='predict')
+    elif isinstance(data_batch, (list, tuple)):
+        outputs = model(**data_batch, mode='predict')
+    else:
+        raise TypeError()
+    evaluator.process(data_samples=outputs, data_batch=data_batch)
+metrics = evaluator.evaluate(len(val_dataloader.dataset))
+```
+In [runner tutorial](../tutorials/runner.md), we simply mentioned the relationship between DataLoader, model and evaluator, and introduced the concept of `data_preprocessor`. You may have a certain understanding of the model. However, during the running of Runner, the situation is far more complex than the above pseudo-code.
+In order to focus your attention on the algorithm itself, and ignore the complex relationship between the model, DataLoader and evaluator, we designed [BaseModel](mmengine.model.BaseModel). In most cases, the only thing you need to do is to make your model inherit from `BaseModel`, and implement the `forward` as required to perform the training, testing, and validation process.
+Before continuing reading the model tutorial, let's throw out two questions that we hope you will find the answers after reading the model tutorial:
+1. When do we update the parameters of model? and how to update the parameters by a custom optimization process?
+2. Why is the concept of data_preprocessor necessary? What functions can it perform?
+## Interface introduction
+Usually, we should define a model to implement the body of the algorithm. In MMEngine, model will be managed by Runner, and need to implement some interfaces, such as `train_step`, `val_step`, and `test_step`. For high-level tasks like detection, classification, and segmentation, the interfaces mentioned above commonly implement a standard workflow. For example, `train_step` will calculate the loss and update the parameters of the model, and `val_step`/`test_step` will calculate the metrics and return the predictions. Therefore, MMEnine abstracts the [BaseModel](mmengine.model.BaseModel) to implement the common workflow.
+Benefits from the `BaseModel`, we only need to make the model inherit from `BaseModel`, and implement the `forward` function to perform the training, testing, and validation process.
+```{note}
+BaseModel inherits from [BaseModule](../advanced_tutorials/initialize.md)，which can be used to initialize the model parameters dynamically.
+```
+[**forward**](mmengine.model.BaseModel.forward): The arguments of `forward` need to match with the data given by [DataLoader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html). If the DataLoader samples a tuple `data`, `forward` needs to accept the value of unpacked `*data`. If DataLoader returns a dict `data`, `forward` needs to accept the key-value of unpacked `**data`. `forward` also accepts `mode` parameter, which is used to control the running branch:
+- `mode='loss'`: `loss` mode is enabled in training process, and `forward` returns a differentiable loss `dict`. Each key-value pair in loss `dict` will be used to log the training status and optimize the parameters of model. This branch will be called by `train_step`
+- `mode='predict'`: `predict` mode is enabled in validation/testing process, and `forward` will return predictions, which matches with arguments of [process](mmengine.evaluator.Evaluator.process). Repositories of OpenMMLab have a more strict rules. The predictions must be a list and each element of it must be a [BaseDataElement](../advanced_tutorials/data_element.md). This branch will be called by `val_step`
+- `mode='tensor'`: In `tensor` and `predict` modes, `forward` will return the predictions. The difference is that `forward` will return a `tensor` or a container or `tensor` which has not been processed by a series of post-process methods, such as non-maximum suppression (NMS). You can customize your post-process method after getting the result of `tensor` mode.
+[**train_step**](mmengine.model.BaseModel.train_step): Get the loss `dict` by calling `forward` with `loss` mode. `BaseModel` implements a standard optimization process as follows:
+```python
+def train_step(self, data, optim_wrapper):
+    # See details in the next section
+    data = self.data_preprocessor(data, training=True)
+    # `loss` mode, return a loss dict. Actually train_step accepts
+    #  both tuple  dict input, and unpack it with ** or *
+    loss = self(**data, mode='loss')
+    # Parse the loss dict and return the parsed losses for optimization
+    # and log_vars for logging
+    parsed_losses, log_vars = self.parse_losses()
+    optim_wrapper.update_params(parsed_losses)  # 更新参数
+    return log_vars
+```
+[**val_step**](mmengine.model.BaseModel.val_step): Get the predictions by calling `forward` with `predict` mode.
+```python
+def val_step(self, data, optim_wrapper):
+    data = self.data_preprocessor(data, training=False)
+    outputs = self(**data, mode='predict')
+    return outputs
+```
+[**test_step**](mmengine.model.BaseModel.test_step): There is no difference between `val_step` and `test_step` in `BaseModel`. But we can customize it in the subclasses, for example, you can get validation loss in `val_step`.
+Understand the interfaces of `BaseModel`, now we are able to come up with a more complete pseudo-code:
+```python
+# training
+for data_batch in train_dataloader:
+    loss_dict = model.train_step(data_batch)
+# validation
+for data_batch in val_dataloader:
+    preds = model.test_step(data_batch)
+    evaluator.process(data_samples=outputs, data_batch=data_batch)
+metrics = evaluator.evaluate(len(val_dataloader.dataset))
+```
+Great!, ignoring `Hook`, the pseudo-code above almost implements the main logic in [loop](mmengine.runner.EpochBasedTrainLoop)! Let's go back to [15 minutes to get started with MMEngine](../get_started/15_minutes.md), we may truly understand what `MMResNet` has done:
+```python
+import torch.nn.functional as F
+import torchvision
+from mmengine.model import BaseModel
+class MMResNet50(BaseModel):
+    def __init__(self):
+        super().__init__()
+        self.resnet = torchvision.models.resnet50()
+    def forward(self, imgs, labels, mode):
+        x = self.resnet(imgs)
+        if mode == 'loss':
+            return {'loss': F.cross_entropy(x, labels)}
+        elif mode == 'predict':
+            return x, labels
+    # train_step, val_step and test_step have been implemented in BaseModel. we
+    # We list the equivalent code here for better understanding
+    def train_step(self, data, optim_wrapper):
+        data = self.data_preprocessor(data)
+        loss = self(*data, mode='loss')
+        parsed_losses, log_vars = self.parse_losses()
+        optim_wrapper.update_params(parsed_losses)
+        return log_vars
+    def val_step(self, data, optim_wrapper):
+        data = self.data_preprocessor(data)
+        outputs = self(*data, mode='predict')
+        return outputs
+    def test_step(self, data, optim_wrapper):
+        data = self.data_preprocessor(data)
+        outputs = self(*data, mode='predict')
+        return outputs
+```
+Now, you may have a deeper understanding of dataflow, and can answer the first question in [Runner and model](#runner-and-model).
+`BaseModel.train_step` implements the standard optimization standard, and if we want to customize a new optimization process, we can override it in the subclass. However, it is important to note that we need to make sure that `train_step` returns a loss dict.
+## DataPreprocessor
+If your computer is equipped with a GPU (or other hardware that can accelerate training, such as MPS, IPU, etc.), when you run the [15 minutes tutorial](../get_started/15_minutes.md), you will see that the program is running on the GPU, but, when does `MMEngine` move the data and model from the CPU to the GPU?
+In fact, the Runner will move the model to the specified device during the construction, while the data will be moved to the specified device at the `self.data_preprocessor(data)` mentioned in the code snippet of the previous section. The moved data will be further passed to the model.
+Makes sense but it's weird, isn't it? At this point you may be wondering:
+1. `MMResNet50` does not define `data_preprocessor`, but why it can still access `data_preprocessor` and move data to GPU?
+2. Why `BaseModel` does not move data by `data = data.to(device)`, but needs the `DataPreprocessor` to move data?
+The answer to the first question is that: `MMResNet50` inherit from `BaseModel`, and `super().__init__` will build a default `data_preprocessor` for it. The equivalent implementation of the default one is like this:
+```python
+class BaseDataPreprocessor(nn.Module):
+    def forward(self, data, training=True):  # ignore the training parameter here
+        # suppose data given by CIFAR10 is a tuple. Actually
+        # BaseDataPreprocessor could move varies type of data
+        # to target device.
+        return tuple(_data.cuda() for _data in data)
+```
+`BaseDataPreprocessor` will move the data to the specified device.
+Before answering the second question, let's think about a few more questions
+1. Where should we perform normalization? [transform](../advanced_tutorials/data_transform.md) or `Model`?
+   It sounds reasonable to put it in transform to take advantage of Dataloader's multi-process acceleration, and in the model to move it to GPU to use GPU resources to accelerate normalization. However, while we are debating whether CPU normalization is faster than GPU normalization, the time of data moving from CPU to GPU is much longer than the former.
+   In fact, for less computationally intensive operations like normalization, it takes much less time than data transferring, which has a higher priority for being optimized. If I could move the data to the specified device while it is still in `uint8` and before it is normalized (the size of normalized `float` data is 4 times larger than that of unit8), it would reduce the bandwidth and greatly improve the efficiency of data transferring. This "lagged" normalization behavior is one of the main reasons why we designed the `DataPreprocessor`. The data preprocessor moves the data first and then normalizes it.
+2. How we implement the data augmentation like MixUp and Mosaic?
+   Although it seems that MixUp and Mosaic are just special data transformations that should be implemented in transform. However, considering that these two transformations involve **fusing multiple images into one**, it would be very difficult to implement them in transform since the current paradigm of transform is to do various enhancements on **one** image. It would be hard to read additional images from dataset because the dataset is not accessible in the transform. However, if we implement Mosaic or Mixup based on the `batch_data` sampled from Dataloader, everything becomes easy. We can access multiple images at the same time, and we can easily perform the image fusion operation.
+   ```python
+   class MixUpDataPreprocessor(nn.Module):
+       def __init__(self, num_class, alpha):
+           self.alpha = alpha
+       def forward(self, data, training=True):
+           data = tuple(_data.cuda() for _data in data)
+           # Only perform MixUp in training mode
+           if not training:
+               return data
+           label = F.one_hot(label)  # label to OneHot
+           batch_size = len(label)
+           index = torch.randperm(batch_size)  # Get the index of fused image
+           img, label = data
+           lam = np.random.beta(self.alpha, self.alpha)  # Fusion factor
+           # MixUp
+           img = lam * img + (1 - lam) * img[index, :]
+           label = lam * batch_scores + (1 - lam) * batch_scores[index, :]
+           # Since the returned label is onehot encoded, the `forward` of the
+           # model should also be adjusted.
+           return tuple(img, label)
+   ```
+   Therefore, besides data transferring and normalization, another major function of `data_preprocessor` is BatchAugmentation. The modularity of the data preprocessor also helps us to achieve a free combination between algorithms and data augmentation.
+3. What should we do if the data sampled from the DataLoader does not match the model input, should I modify the DataLoader or the model interface?
+   The answer is: neither is appropriate. The ideal solution is to do the adaptation without breaking the existing interface between the model and the DataLoader. `DataPreprocessor` could also handle this, you can customize your `DataPreprocessor` to convert the incoming to the target type.
+By now, You must understand the rationale of the data preprocessor and can confidently answer the two questions posed at the beginning of the tutorial! But you may still wonder what is the `optim_wrapper` passed to `train_step`, and how do the predictions returned by `test_step` and `val_step` relate to the evaluator. You will find more introduction in the [evaluation tutorial](./evaluation.md) and the [optimizer wrapper tutorial](./optim_wrapper.md).
--- a/docs/zh_cn/tutorials/model.md
+++ b/docs/zh_cn/tutorials/model.md
@@ -2,7 +2,7 @@
 ## Runner 与 model
-在 [Runner 教程的基本数据流](./runner.md#基本数据流)中我们提到，DataLoader、model 和 evaluator 之间的数据流通遵循了一些规则，我们不妨先来回顾一下基本数据流的伪代码：
+在 [Runner 教程的基本数据流](./runner.md#基本数据流)中我们提到，DataLoader、model 和 evaluator 之间的数据流通遵循了一些规则，我们先来回顾一下基本数据流的伪代码：
 ```python
 # 训练过程
@@ -48,7 +48,7 @@ metrics = evaluator.evaluate(len(val_dataloader.dataset))
 [**forward**](mmengine.model.BaseModel.forward): `forward` 的入参需通常需要和 [DataLoader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) 的输出保持一致 (自定义[数据预处理器](#数据预处理器datapreprocessor)除外)，如果 `DataLoader` 返回元组类型的数据 `data`，`forward` 需要能够接受 `*data` 的解包后的参数；如果返回字典类型的数据 `data`，`forward` 需要能够接受 `**data` 解包后的参数。 `mode` 参数用于控制 forward 的返回结果：
 - `mode='loss'`：`loss` 模式通常在训练阶段启用，并返回一个损失字典。损失字典的 key-value 分别为损失名和可微的 `torch.Tensor`。字典中记录的损失会被用于更新参数和记录日志。模型基类会在 `train_step` 方法中调用该模式的 `forward`。
- `mode='predict'`： `predict` 模式通常在验证、测试阶段启用，并返回列表/元组形式的预测结果，预测结果需要和 [process](mmengine.evaluator.Evaluator) 接口的参数相匹配。OpenMMLab 系列算法对 `predict` 模式的输出有着更加严格的约定，需要输出列表形式的[数据元素](../advanced_tutorials/data_element.md)。模型基类会在 `val_step`，`test_step` 方法中调用该模式的 `forward`。
+- `mode='predict'`： `predict` 模式通常在验证、测试阶段启用，并返回列表/元组形式的预测结果，预测结果需要和 [process](mmengine.evaluator.Evaluator.process) 接口的参数相匹配。OpenMMLab 系列算法对 `predict` 模式的输出有着更加严格的约定，需要输出列表形式的[数据元素](../advanced_tutorials/data_element.md)。模型基类会在 `val_step`，`test_step` 方法中调用该模式的 `forward`。
 - `mode='tensor'`：`tensor` 和 `predict` 模式均返回模型的前向推理结果，区别在于 `tensor` 模式下，`forward` 会返回未经后处理的张量，例如返回未经非极大值抑制（nms）处理的检测结果，返回未经 `argmax` 处理的分类结果。我们可以基于 `tensor` 模式的结果进行自定义的后处理。
 [**train_step**](mmengine.model.BaseModel.train_step): 执行 `forward` 方法的 `loss` 分支，得到损失字典。模型基类基于[优化器封装](./optim_wrapper.md) 实现了标准的梯度计算、参数更新、梯度清零流程。其等效伪代码如下：
@@ -62,7 +62,7 @@ def train_step(self, data, optim_wrapper):
    return log_vars
 ```
-[**val_step**](mmengine.model.BaseModel.val_step): 执行 `forward` 方法的 `predict` 分支，返回预测结果，预测结果会被传给[评测器](./evaluation.md.md)的 [process](mmengine.evaluator.Evaluator.process) 方法。其等效伪代码如下：
+[**val_step**](mmengine.model.BaseModel.val_step): 执行 `forward` 方法的 `predict` 分支，返回预测结果：
 ```python
 def val_step(self, data, optim_wrapper):
@@ -71,7 +71,7 @@ def val_step(self, data, optim_wrapper):
    return outputs
 ```
-[**test_step**](mmengine.model.BaseModel.test_step): 同 `val_step`，预测结果会被传给 `after_test_iter` 接口。
+[**test_step**](mmengine.model.BaseModel.test_step): 同 `val_step`
 看到这我们就可以给出一份 **基本数据流伪代码 plus**：
@@ -131,7 +131,7 @@ class MMResNet50(BaseModel):
 ## 数据预处理器（DataPreprocessor）
-如果你的电脑配有 GPU（或其他能够加速训练的硬件，如 mps、ipu 等），并且运行了 [15 分钟上手 MMEngine](../get_started/15_minutes.md) 的代码示例，你会发现程序是在 GPU 上运行的，那么 `MMEngine` 是在何时把数据和模型从 CPU 搬运到 GPU 的呢？
+如果你的电脑配有 GPU（或其他能够加速训练的硬件，如 MPS、IPU 等），并且运行了 [15 分钟上手 MMEngine](../get_started/15_minutes.md) 的代码示例，你会发现程序是在 GPU 上运行的，那么 `MMEngine` 是在何时把数据和模型从 CPU 搬运到 GPU 的呢？
 事实上，执行器会在构造阶段将模型搬运到指定设备，而数据则会在上一节提到的 `self.data_preprocessor` 这一行搬运到指定设备，进一步将处理好的数据传给模型。看到这里相信你会疑惑：
@@ -143,7 +143,7 @@ class MMResNet50(BaseModel):
 ```python
 class BaseDataPreprocessor(nn.Module):
-    def forward(self, data, training=True):  # 先忽略 ignore 参数
+    def forward(self, data, training=True):  # 先忽略 training 参数
        # 假设 data 是 CIFAR10 返回的 tuple 类型数据，事实上
        # BaseDataPreprocessor 可以处理任意类型的数
        # BaseDataPreprocessor 同样可以把数据搬运到多种设备，这边方便
@@ -184,7 +184,7 @@ class BaseDataPreprocessor(nn.Module):
           # 原图和标签的 MixUp.
           img = lam * img + (1 - lam) * img[index, :]
           label = lam * batch_scores + (1 - lam) * batch_scores[index, :]
-           # 由于此时返回的是 onehot 编码的 label，model 的 loss 也需要做相应调整
+           # 由于此时返回的是 onehot 编码的 label，model 的 forward 也需要做相应调整
           return tuple(img, label)
   ```