模型压缩分为两大类:模型连接剪枝,针对已训练的模型,将其中不重要的结构去除;权重稀疏化,训练过程中将不重要的权重置为0,使得权重分布更稀疏。权重稀疏化算法最著名的便是2015年Han发表的https://arxiv.org/abs/1510.00149,将裁剪、权值共享和量化、编码等方式运用在模型压缩上。
yolov5权重稀疏化yolov5中的权重稀疏化做法,调用prune函数,去除30%的权重。
# Initialize/load model and set device training = model is not None if training: # called by train.py device, pt, jit, engine = next(model.parameters()).device, True, False, False # get model device, PyTorch model half &= device.type != 'cpu' # half precision only supported on CUDA model.half() if half else model.float() from util.torch_utils import prune prune(model,0.3)
稀疏化的结果,我们可以观察到,修剪后,我们的模型的稀疏率为30%,这意味着模型权重参数的30%在nn.Conv2d层中等于0。推理时间基本上保持不变,而模型的AP和AR分数略有减少。
val: data=./data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True YOLOv5 v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB) Fusing layers... Model Summary: 476 layers, 87730285 parameters, 0 gradients val: Scanning '../datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:01<00:00, 2846.03it/s] val: New cache created: ../datasets/coco/val2017.cache Class Images Labels P R mAP@.5 mAP@.5:.95: 100% 157/157 [02:30<00:00, 1.05it/s] all 5000 36335 0.746 0.626 0.68 0.49 Speed: 0.1ms pre-process, 22.4ms inference, 1.4ms NMS per image at shape (32, 3, 640, 640) # <--- baseline speed evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json... ... Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.504 # <--- baseline mAP Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.688 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.546 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.351 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.382 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.628 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.681 # <--- baseline mAR Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.524 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.735 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826https://arxiv.org/abs/1608.04493
《Dynamic Network Surgery for Efficient DNNs》中介绍了一种动态的模型裁剪方法,包括以下两个过程:pruning和splicing,pruning就是将认为不重要的weight去掉,但是往往无法直观的判断哪些weight是否重要,因此在这里增加了一个splicing的过程,将那些重要的被裁掉的weight再恢复回来,将重要的结构修补回来。该算法采取了剪枝与嫁接相结合、训练与压缩相同步的策略完成网络压缩任务。通过网络嫁接 *** 作的引入,避免了错误剪枝所造成的性能损失,从而在实际 *** 作中更好地逼近网络压缩的理论极限。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)