Newer
Older
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
```python
optim_wrapper = dict(
constructor='CustomConstructor',
type='OptimWrapper', # Specify the type of OptimWrapper
optimizer=dict( # optimizer configuration
type='AdamW',
lr=0.0001,
betas=(0.9, 0.999),
weight_decay=0.05)
paramwise_cfg={
'decay_rate': 0.95,
'decay_type': 'layer_wise',
'num_layers': 6
})
```
</div>
</td>
</tr>
</thead>
</table>
```{note}
For the high-level tasks like detection and classification, MMCV needs to configure `optim_config` to build `OptimizerHook`, while not necessary for MMEngine.
```
`optim_wrapper` used in this tutorial is as follows:
```python
from torch.optim import SGD
optimizer = SGD(model.parameters(), lr=0.1, momentum=0.9)
optim_wrapper = dict(optimizer=optimizer)
```
### Prepare hooks
**Prepare hooks in MMCV**
The commonly used hooks configuration in MMCV is as follows:
```python
# learning rate scheduler config
lr_config = dict(policy='step', step=[2, 3])
# configuration of optimizer
optimizer_config = dict(grad_clip=None)
# configuration of saving checkpoints periodically
checkpoint_config = dict(interval=1)
# save log periodically and multiple hooks can be used simultaneously
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# register hooks to runner and those hooks will be invoked automatically
runner.register_training_hooks(
lr_config=lr_config,
optimizer_config=optimizer_config,
checkpoint_config=checkpoint_config,
log_config=log_config)
```
Among them:
- `lr_config` is used for `LrUpdaterHook`
- `optimizer_config` is used for `OptimizerHook`
- `checkpoint_config` is used for `CheckPointHook`
- `log_config` is used for `LoggerHook`
Besides the hooks mentioned above, MMCV Runner will build `IterTimerHook` automatically. MMCV `Runner` will register the training hooks after instantiating the model, while MMEngine Runner will initialize the hooks during instantiating the model.
**Prepare hooks in MMEngine**
MMEngine `Runner` takes some commonly used hooks in MMCV as the default hooks.
- [RuntimeInfoHook](mmengine.hooks.RuntimeInfoHook)
- [IterTimerHook](mmengine.hooks.IterTimerHook)
- [DistSamplerSeedHook](mmengine.hooks.DistSamplerSeedHook)
- [LoggerHook](mmengine.hooks.LoggerHook)
- [CheckpointHook](mmengine.hooks.CheckpointHook)
- [ParamSchedulerHook](mmengine.hooks.ParamSchedulerHook)
Compared with the example of MMCV
- `LrUpdaterHook` correspond to the `ParamSchedulerHook`, find more details in [migrate scheduler](./param_scheduler.md)
- MMEngine optimize the model in [train_step](mmengine.model.BaseModel.train_step), therefore we do not need `OptimizerHook` in MMEngine anymore
- MMEngine takes `CheckPointHook` as the default hook
- MMEngine take `LoggerHook` as the default hook
Therefore, we can achieve the same effect as the MMCV example as long as we configure the [param_scheduler](../tutorials/param_scheduler.md) correctly.
We can also register custom hooks in MMEngine runner, find more details in [runner tutorial](../tutorials/runner.md) and [migrate hook](./hook.md).
<table class="docutils">
<thead>
<tr>
<th>Commonly used hooks in MMCV</th>
<th>Default hooks in MMEngine</th>
<tbody>
<tr>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
# Configure training hooks
# Configure LrUpdaterHook
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
# Configure OptimizerHook
optimizer_config = dict(grad_clip=None)
# Configure LoggerHook
log_config = dict( # LoggerHook
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# Configure CheckPointHook
checkpoint_config = dict(interval=1) # CheckPointHook
```
</div>
</td>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
# Configure parameter scheduler
param_scheduler = [
dict(
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
dict(
type='MultiStepLR',
begin=0,
end=12,
by_epoch=True,
milestones=[8, 11],
gamma=0.1)
]
# Configure default hooks
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=1),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='DetVisualizationHook'))
```
</div>
</td>
</tr>
</thead>
</table>
The parameter scheduler used in this tutorial is as follows:
```python
from math import gamma
param_scheduler = dict(type='MultiStepLR', milestones=[2, 3], gamma=0.1)
```
### Prepare testing/validation components
MMCV implements the validation process by `EvalHook`, and we'll not talk too much about it here. Given that validation is a common process in training, MMEngine abstracts validation as two independent modules: [Evaluator](../tutorials/evaluation.md) and [ValLoop](../tutorials/runner.md). We can customize the metric or the validation process by defining a new [loop](mmengine.runner.ValLoop) or a new [metric](mmengine.evaluator.BaseMetric).
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
```python
import torch
from mmengine.evaluator import BaseMetric
from mmengine.registry import METRICS
@METRICS.register_module(force=True)
class ToyAccuracyMetric(BaseMetric):
def process(self, label, pred) -> None:
self.results.append((label[1], pred, len(label[1])))
def compute_metrics(self, results: list) -> dict:
num_sample = 0
acc = 0
for label, pred, batch_size in results:
acc += (label == torch.stack(pred)).sum()
num_sample += batch_size
return dict(Accuracy=acc / num_sample)
```
After defining the metric, we should also configure the evaluator and loop for `Runner`. The example used in this tutorial is as follows:
```python
val_evaluator = dict(type='ToyAccuracyMetric')
val_cfg = dict(type='ValLoop')
```
<table class="docutils">
<thead>
<tr>
<th>Configure validation in MMCV</th>
<th>Configure validation in MMEngine</th>
<tbody>
<tr>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
eval_cfg = cfg.get('evaluation', {})
eval_cfg['by_epoch'] = cfg.runner['type'] != 'IterBasedRunner'
eval_hook = DistEvalHook if distributed else EvalHook
runner.register_hook(
eval_hook(val_dataloader, **eval_cfg), priority='LOW')
```
</div>
</td>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
val_dataloader = val_dataloader
val_evaluator = dict(type='ToyAccuracyMetric')
val_cfg = dict(type='ValLoop')
```
</div>
</td>
</tr>
</thead>
</table>
### Build Runner
**Building Runner in MMCV**
```python
runner = EpochBasedRunner(
model=model,
optimizer=optimizer,
work_dir=work_dir,
logger=logger,
max_epochs=4
)
```
**Building Runner in MMEngine**
The `EpochBasedRunner` and `max_epochs` arguments in `MMCV` are moved to `train_cfg` in MMEngine. All parameters configurable in `train_cfg` are listed below:
- by_epoch: `True` equivalent to `EpochBasedRunner`. `False` equivalent to `IterBasedRunner`
- `max_epoch/max_iter`: Equivalent to `max_epochs` and `max_iters` in MMCV
- `val_iterval`: Equivalent to `interval` in MMCV
```python
from mmengine.runner import Runner
runner = Runner(
model=model, # model to be optimized
work_dir='./work_dir', # working directory
randomness=randomness, # random seed
env_cfg=env_cfg, # environment config
launcher='none', # launcher for distributed training
optim_wrapper=optim_wrapper, # configure optimizer wrapper
param_scheduler=param_scheduler, # configure parameter scheduler
train_dataloader=train_dataloader, # configure train dataloader
train_cfg=dict(by_epoch=True, max_epochs=4, val_interval=1), # Configure training loop
val_dataloader=val_dataloader, # Configure validation dataloader
val_evaluator=val_evaluator, # Configure evaluator and metrics
val_cfg=val_cfg) # Configure validation loop
```
### Load checkpoint
**Loading checkpoint in MMCV**
```python
if cfg.resume_from:
runner.resume(cfg.resume_from)
elif cfg.load_from:
runner.load_checkpoint(cfg.load_from)
```
**Loading checkpoint in MMEngine**
```python
runner = Runner(
...
load_from='/path/to/checkpoint',
resume=True
)
```
<table class="docutils">
<thead>
<tr>
<th>Configuration of loading checkpoint in MMCV</th>
<th>Configuration of loading checkpoint in MMEngine</th>
<tbody>
<tr>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
load_from = 'path/to/ckpt'
```
</td>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
load_from = 'path/to/ckpt'
resume = False
```
</div>
</td>
</tr>
<tr>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
resume_from = 'path/to/ckpt'
```
</td>
<td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">
```python
load_from = 'path/to/ckpt'
resume = True
```
</div>
</td>
</tr>
</thead>
</table>
### Training process
**Training process in MMCV**
Resume or load checkpoint firstly, and then start training.
```python
if cfg.resume_from:
runner.resume(cfg.resume_from)
elif cfg.load_from:
runner.load_checkpoint(cfg.load_from)
runner.run(data_loaders, cfg.workflow)
```
**Training process in MMEngine**
Complete the process mentioned above the `Runner.__init__` and `Runner.train`
```python
runner.train()
```
### Testing process
Since MMCV Runner does not integrate the test function, we need to implement the test scripts by ourselves.
For MMEngine Runner, as long as we have configured the `test_dataloader`, `test_cfg` and `test_evaluator` for the `Runner`, we can call `Runner.test` to start the testing process.
**`work_dir` is the same for training**
```python
runner = Runner(
model=model,
work_dir='./work_dir',
randomness=randomness,
env_cfg=env_cfg,
Evan
committed
launcher='none',
optim_wrapper=optim_wrapper,
train_dataloader=train_dataloader,
train_cfg=dict(by_epoch=True, max_epochs=5, val_interval=1),
val_dataloader=val_dataloader,
val_evaluator=val_evaluator,
val_cfg=val_cfg,
Evan
committed
test_dataloader=val_dataloader,
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
test_evaluator=val_evaluator,
test_cfg=dict(type='TestLoop'),
)
runner.test()
```
**`work_dir` is the different for training, configure load_from manually**
```python
runner = Runner(
model=model,
work_dir='./test_work_dir',
load_from='./work_dir/epoch_5.pth', # set load_from additionally
randomness=randomness,
env_cfg=env_cfg,
launcher='none',
optim_wrapper=optim_wrapper,
train_dataloader=train_dataloader,
train_cfg=dict(by_epoch=True, max_epochs=5, val_interval=1),
val_dataloader=val_dataloader,
val_evaluator=val_evaluator,
val_cfg=val_cfg,
test_dataloader=val_dataloader,
test_evaluator=val_evaluator,
test_cfg=dict(type='TestLoop'),
)
runner.test()
```
### Customize training process
If we want to customize a training/validation process, we need to override the `Runner.val` or `Runner.train` in a custom `Runner`. Take overriding `runner.train` as an example, suppose we need to train with the same batch twice for each iteration, we can override the `Runner.train` like this:
```python
class CustomRunner(EpochBasedRunner):
def train(self, data_loader, **kwargs):
self.model.train()
self.mode = 'train'
self.data_loader = data_loader
self._max_iters = self._max_epochs * len(self.data_loader)
self.call_hook('before_train_epoch')
time.sleep(2) # Prevent possible deadlock during epoch transition
for i, data_batch in enumerate(self.data_loader):
self.data_batch = data_batch
self._inner_iter = i
for _ in range(2)
self.call_hook('before_train_iter')
self.run_iter(data_batch, train_mode=True, **kwargs)
self.call_hook('after_train_iter')
del self.data_batch
self._iter += 1
self.call_hook('after_train_epoch')
self._epoch += 1
```
In MMEngine, we need to customize a train loop.
```python
from mmengine.registry import LOOPS
from mmengine.runner import EpochBasedTrainLoop
@LOOPS.register_module()
class CustomEpochBasedTrainLoop(EpochBasedTrainLoop):
def run_iter(self, idx, data_batch) -> None:
for _ in range(2):
super().run_iter(idx, data_batch)
```
and then, we need to set `type` as `CustomEpochBasedTrainLoop` in `train_cfg`. Note that `by_epoch` and `type` cannot be configured at the same time. Once `by_epoch` is configured, the type of the training loop will be inferred as `EpochBasedTrainLoop`.
```python
runner = Runner(
model=model,
work_dir='./test_work_dir',
randomness=randomness,
env_cfg=env_cfg,
launcher='none',
optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.001, momentum=0.9)),
train_dataloader=train_dataloader,
train_cfg=dict(
type='CustomEpochBasedTrainLoop',
max_epochs=5,
val_interval=1),
val_dataloader=val_dataloader,
val_evaluator=val_evaluator,
val_cfg=val_cfg,
test_dataloader=val_dataloader,
test_evaluator=val_evaluator,
test_cfg=dict(type='TestLoop'),
)
runner.train()
```
For more complicated migration needs of `Runner`, you can refer to the [runner tutorials](../tutorials/runner.md) and [runner design](../design/runner.md).