[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory

[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory,第1张

[pytorch]运行VGG训练模型出现:RuntimeError: CUDA out of memory 错误一:RuntimeError: CUDA out of memory

RuntimeError: CUDA out of memory

报错代码

vgg16_model.eval()
            with torch.no_grad():
                for j, data in enumerate(valid_loader):
                    inputs, labels = data
                    inputs, labels = inputs.to(device), labels.to(device)

                    bs, ncrops, c, h, w = inputs.size()
                    outputs = vgg16_model(inputs.view(-1, c, h, w))
                    outputs_avg = outputs.view(bs, ncrops, -1).mean(1)

                    loss = criterion(outputs_avg, labels)

                    _, predicted = torch.max(outputs_avg.data, 1)
                    total_val += labels.size(0)
                    correct_val += (predicted == labels).squeeze().cpu().sum().numpy()

                    loss_val += loss.item()

                loss_val_mean = loss_val/len(valid_loader)
                valid_curve.append(loss_val_mean)
                print("Valid:t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                    epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val_mean, correct_val / total_val))
            vgg16_model.train()

修改后的代码:with torch.no_grad():

'''
加 with torch.no_grad,来表明此处不需要进行梯度计算。因为我在验证集加了model.eval所以好像pytorch并不会自动进行val的梯度的清理。
'''
        if (epoch+1) % val_interval == 0:

            correct_val = 0.
            total_val = 0.
            loss_val = 0.
            vgg16_model.eval()
            with torch.no_grad():
                for j, data in enumerate(valid_loader):
                    inputs, labels = data
                    inputs, labels = inputs.to(device), labels.to(device)

                    bs, ncrops, c, h, w = inputs.size()
                    outputs = vgg16_model(inputs.view(-1, c, h, w))
                    outputs_avg = outputs.view(bs, ncrops, -1).mean(1)

                    loss = criterion(outputs_avg, labels)

                    _, predicted = torch.max(outputs_avg.data, 1)
                    total_val += labels.size(0)
                    correct_val += (predicted == labels).squeeze().cpu().sum().numpy()

                    loss_val += loss.item()

                loss_val_mean = loss_val/len(valid_loader)
                valid_curve.append(loss_val_mean)
                print("Valid:t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                    epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val_mean, correct_val / total_val))
            vgg16_model.train()
错误二:修改好后又报了如下:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
C:cbpytorch_1000000000000workatensrcATennativecudaLoss.cu:247: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.

修改前代码:

# backward
            optimizer.zero_grad()
            loss = criterion(outputs, labels)
            loss.backward()

            # update weights
            optimizer.step()

修改后代码:加上loss = loss.requires_grad_()

 # backward
                optimizer.zero_grad()
                loss = criterion(outputs, labels)
                loss = loss.requires_grad_()
                loss.backward()

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5712094.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存