Skip to content
Snippets Groups Projects
Commit 32c2c1e3 authored by Zaida Zhou's avatar Zaida Zhou
Browse files

[Docs] Translate hook docs (#709)


* [Docs] Translate hook docs

* fix indent

* Apply suggestions from code review

Co-authored-by: default avatarQian Zhao <112053249+C1rN09@users.noreply.github.com>

Co-authored-by: default avatarQian Zhao <112053249+C1rN09@users.noreply.github.com>
parent daab8e06
No related branches found
No related tags found
No related merge requests found
# Hook
Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/hook.html).
Hook programming is a programming pattern in which a mount point is set in one or more locations of a program. When the program runs to a mount point, all methods registered to it at runtime are automatically called. Hook programming can increase the flexibility and extensibility of the program, since users can register custom methods to the mount point to be called without modifying the code in the program.
## Built-in Hooks
MMEngine encapsules many ultilities as built-in hooks. These hooks are divided into two categories, namely default hooks and custom hooks. The former refers to those registered with the [Runner](mmengine.runner.Runner) by default, while the latter refers to those registered by the user on demand.
Each hook has a corresponding priority. At each mount point, hooks with higher priority are called earlier by the `Runner`. When sharing the same priority, the hooks are called in their registration order. The priority list is as follows.
- HIGHEST (0)
- VERY_HIGH (10)
- HIGH (30)
- ABOVE_NORMAL (40)
- NORMAL (50)
- BELOW_NORMAL (60)
- LOW (70)
- VERY_LOW (90)
- LOWEST (100)
**default hooks**
| Name | Function | Priority |
| :-----------------------------------------: | :------------------------------------------------------------------------------------------------------------------: | :---------------: |
| [RuntimeInfoHook](#runtimeinfohook) | update runtime information into message hub | VERY_HIGH (10) |
| [IterTimerHook](#itertimerhook) | Update the time spent during iteration into message hub | NORMAL (50) |
| [DistSamplerSeedHook](#distsamplerseedhook) | Ensure distributed Sampler shuffle is active | NORMAL (50) |
| [LoggerHook](#loggerhook) | Collect logs from different components of `Runner` and write them to terminal, JSON file, tensorboard and wandb .etc | BELOW_NORMAL (60) |
| [ParamSchedulerHook](#paramschedulerhook) | update some hyper-parameters of optimizer | LOW (70) |
| [CheckpointHook](#checkpointhook) | Save checkpoints periodically | VERY_LOW (90) |
**custom hooks**
| Name | Function | Priority |
| :---------------------------------: | :----------------------------------------------------------------------: | :---------: |
| [EMAHook](#emahook) | apply Exponential Moving Average (EMA) on the model during training | NORMAL (50) |
| [EmptyCacheHook](#emptycachehook) | Releases all unoccupied cached GPU memory during the process of training | NORMAL (50) |
| [SyncBuffersHook](#syncbuffershook) | Synchronize model buffers at the end of each epoch | NORMAL (50) |
```{note}
It is not recommended to modify the priority of the default hooks, as hooks with lower priority may depend on hooks with higher priority. For example, `CheckpointHook` needs to have a lower priority than ParamSchedulerHook so that the saved optimizer state is correct. Also, the priority of custom hooks defaults to `NORMAL (50)`.
```
The two types of hooks are set differently in the Runner, with the configuration of default hooks being passed to the `default_hooks` parameter of the Runner and the configuration of custom hooks being passed to the `custom_hooks` parameter, as follows.
```python
from mmengine.runner import Runner
default_hooks = dict(
runtime_info=dict(type='RuntimeInfoHook'),
timer=dict(type='IterTimerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
logger=dict(type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=1),
)
custom_hooks = [dict(type='EmptyCacheHook')]
runner = Runner(default_hooks=default_hooks, custom_hooks=custom_hooks, ...)
runner.train()
```
### CheckpointHook
[CheckpointHook](mmengine.hooks.CheckpointHook) saves the checkpoints at a given interval. In the case of distributed training, only the master process will save the checkpoints. The main features of `CheckpointHook` is as follows.
- Save checkpoints by interval, and support saving them by epoch or iteration
- Save the most recent checkpoints
- Save the best checkpoints
- Specify the path to save the checkpoints
For more features, please read the [CheckpointHook API documentation](mmengine.hooks.CheckpointHook).
The four features mentioned above are described below.
- Save checkpoints by interval, and support saving them by epoch or iteration
Suppose we train a total of 20 epochs and want to save the checkpoints every 5 epochs, the following configuration will help us to achieve this requirement.
```python
# the default value of by_epoch is True
default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5, by_epoch=True))
```
If you want to save checkpoints by iteration, you can set `by_epoch` to False and `interval=5` to save them every 5 iterations.
```python
default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5, by_epoch=False))
```
- Save the most recent checkpoints
If you only want to keep a certain number of checkpoints, you can set the `max_keep_ckpts` parameter. When the number of checkpoints saved exceeds `max_keep_ckpts`, the previous checkpoints will be deleted.
```python
default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5, max_keep_ckpts=2))
```
The above config shows that if a total of 20 epochs are trained, the model will be saved at epochs 5, 10, 15, and 20, but the checkpoint `epoch_5.pth` will be deleted at epoch 15, and at epoch 20 the checkpoint `epoch_10.pth` will be deleted, so that only the `epoch_15.pth` and `epoch_20.pth` will be saved.
- Save the best checkpoints
If you want to save the best checkpoints of the validation set for the training process, you can set the `save_best` parameter. If set to `'auto'`, the current checkpoint are judged to be best based on the first evaluation metric of the validation set (the evaluation metrics returned by evaluator are an ordered dictionary).
```python
default_hooks = dict(checkpoint=dict(type='CheckpointHook', save_best='auto'))
```
You can also directly specify the value of `save_best` as the evaluation metric, for example, in a classification task, you can specify `save_best='top-1'`, then the current checkpoint will be judged as best based on the value of `'top-1'`.
In addition to the `save_best` parameter, other parameters related to saving the best checkpoint are `rule`, `greater_keys` and `less_keys`, which are used to imply whether its good to have large value or not. For example, if you specify `save_best='top-1'`, you can specify `rule='greater'` to imply that the larger the value, the better the checkpoint.
- Specify the path to save the checkpoints
The checkpoints are saved in `work_dir` by default, but the path can be changed by setting `out_dir`.
```python
default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5, out_dir='/path/of/directory'))
```
[LoggerHook](mmengine.hooks.LoggerHook) collects logs from different components of `Runner` and write them to terminal, JSON file, tensorboard and wandb .etc.
If we want to output (or save) the logs every 20 iterations, we can set the `interval` parameter and configure it as follows.
```python
default_hooks = dict(logger=dict(type='LoggerHook', interval=20))
```
If you are interested in how MMEngine manages logging, you can refer to [logging](../advanced_tutorials/logging.md).
### ParamSchedulerHook
[ParamSchedulerHook](mmengine.hooks.ParamSchedulerHook) iterates through all optimizer parameter schedulers of the Runner and calls their `step` method to update the optimizer parameters in order. See [Parameter Schedulers](param_scheduler.md) for more details about what are parameter schedulers.
`ParamSchedulerHook` is registered to the Runner by default and has no configurable parameters, so there is no need to configure it.
### IterTimerHook
[IterTimerHook](mmengine.hooks.IterTimerHook) is used to record the time taken to load data and iterate once.
`IterTimerHook` is registered to the Runner by default and has no configurable parameters, so there is no need to configure it.
### DistSamplerSeedHook
[DistSamplerSeedHook](mmengine.hooks.DistSamplerSeedHook) calls the `step` method of the Sampler during distributed training to ensure that the shuffle operation takes effect.
`DistSamplerSeedHook` is registered to the Runner by default and has no configurable parameters, so there is no need to configure it.
### RuntimeInfoHook
[RuntimeInfoHook](mmengine.hooks.RuntimeInfoHook) will update the current runtime information (e.g. epoch, iter, max_epochs, max_iters, lr, metrics, etc.) to the message hub at different mount points in the Runner so that other modules without access to the Runner can obtain this information.
`RuntimeInfoHook` is registered to the Runner by default and has no configurable parameters, so there is no need to configure it.
### EMAHook
[EMAHook](mmengine.hooks.EMAHook) performs an exponential moving average operation on the model during training, with the aim of improving the robustness of the model. Note that the model generated by exponential moving average is only used for validation and testing, and does not affect training.
```python
custom_hooks = [dict(type='EMAHook')]
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
`EMAHook` uses [ExponentialMovingAverage](mmengine.model.ExponentialMovingAverage) by default, with optional values of [StochasticWeightAverage](mmengine.model.StochasticWeightAverage) and [MomentumAnnealingEMA](mmengine.model.MomentumAnnealingEMA). Other averaging strategies can be used by setting `ema_type`.
```python
custom_hooks = [dict(type='EMAHook', ema_type='StochasticWeightAverage')]
```
See [EMAHook API Reference](mmengine.hooks.EMAHook) for more usage.
### EmptyCacheHook
[EmptyCacheHook](mmengine.hooks.EmptyCacheHook) calls `torch.cuda.empty_cache()` to release all unoccupied cached GPU memory. The timing of releasing memory can be controlled by setting parameters like `before_epoch`, `after_iter`, and `after_epoch`, meaning before the start of each epoch, after each iteration, and after each epoch respectively.
```python
# The release operation is performed at the end of each epoch
custom_hooks = [dict(type='EmptyCacheHook', after_epoch=True)]
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
### SyncBuffersHook
[SyncBuffersHook](mmengine.hooks.SyncBuffersHook) synchronizes the buffer of the model at the end of each epoch during distributed training, e.g. `running_mean` and `running_var` of the BN layer.
```python
custom_hooks = [dict(type='SyncBuffersHook')]
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
## Customize Your Hooks
If the built-in hooks provided by MMEngine do not cover your demands, you are encouraged to customize your own hooks by simply inheriting the base [hook](mmengine.hooks.Hook) class and overriding the corresponding mount point methods.
For example, if you want to check whether the loss value is valid, i.e. not infinite, during training, you can simply override the `after_train_iter` method as below. The check will be performed after each training iteration.
```python
import torch
from mmengine.registry import HOOKS
from mmengine.hooks import Hook
@HOOKS.register_module()
class CheckInvalidLossHook(Hook):
"""Check invalid loss hook.
This hook will regularly check whether the loss is valid
during training.
Args:
interval (int): Checking interval (every k iterations).
Defaults to 50.
"""
def __init__(self, interval=50):
self.interval = interval
def after_train_iter(self, runner, batch_idx, data_batch=None, outputs=None):
"""All subclasses should override this method, if they need any
operations after each training iteration.
Args:
runner (Runner): The runner of the training process.
batch_idx (int): The index of the current batch in the train loop.
data_batch (dict or tuple or list, optional): Data from dataloader.
outputs (dict, optional): Outputs from model.
"""
if self.every_n_train_iters(runner, self.interval):
assert torch.isfinite(outputs['loss']),\
runner.logger.info('loss become infinite or NaN!')
```
We simply pass the hook config to the `custom_hooks` parameter of the Runner, which will register the hooks when the Runner is initialized.
```python
from mmengine.runner import Runner
custom_hooks = dict(
dict(type='CheckInvalidLossHook', interval=50)
)
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train() # start training
```
Then the loss value are checked after iteration.
Note that the priority of the custom hook is `NORMAL (50)` by default, if you want to change the priority of the hook, then you can set the priority key in the config.
```python
custom_hooks = dict(
dict(type='CheckInvalidLossHook', interval=50, priority='ABOVE_NORMAL')
)
```
You can also set priority when defining classes.
```python
@HOOKS.register_module()
class CheckInvalidLossHook(Hook):
priority = 'ABOVE_NORMAL'
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment