0%

PyTorch图片识别

Posted on 2024-10-27

Pytorch框架学习—图片分类

导入必要的库和模块

import torch # PyTorchd的核心库，提供张量（Tensor）计算和自动微分等功能
from torchvision import models, transforms # PyTorch的计算机视觉库，包括了预训练模型、图像处理工具和常用数据集
from torchvision.models import ResNet101_Weights # 包括了用到的模型的预训练权重和类别标签
from PIL import Image # 加载图片模块

加载与训练的ResNet101模型

1
2
3

# models.resnet101：ResNet-101模型，该网络有101层，是一种深度残差网络
# weights=ResNet101_Weights.IMAGENET1K_V1：表示使用的与训练的权重
resnet = models.resnet101(weights=ResNet101_Weights.IMAGENET1K_V1)

深度残差网络(ResNet-101)使用“残差块”，通过引入快捷连接解决深层网络中的梯度消失问题。ResNet-101有超过百万个参数，适合处理复杂的图像特征。

图像预处理

# transforms.Compose：将多个图像处理操作组合成一个顺序管道。
# Resize(256)：将图片大小调整为256像素的边长。
# CenterCrop(224)：将图片中心裁剪成224×224像素，这是 ResNet 模型输入的标准尺寸。
# ToTensor()：将图片数据转换为张量，并将像素值归一化到 [0, 1] 范围内。
# Normalize(mean, std)：标准化处理，调整颜色分布使其符合ImageNet上的训练数据。将每个通道的均值减去并除以标准差。
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

深度学习模型对于输入的尺度比较敏感，因此通常需要对输入的数据进行标准化处理。这样可以保证输入分布与训练数据一致，从而提高模型的泛化能力。

加载和预处理图像

# Image.open：打开图片文件。
# preprocess(img)：对图像执行上一步定义的预处理操作。
# torch.unsqueeze(img_t, 0)：为模型输入添加一个批次维度，使图像数据形状符合模型输入的要求（1, 3, 224, 224）。
img = Image.open("./Resnet/Image/image1.jpg")
img_t = preprocess(img)
batch_t = torch.unsqueeze(img_t, 0)

卷积神经网络的期待输入的图片张量形状为（batch_szie，channels，height，weight），在这里，unsqueeze在第0维度增加了batch大小。

设置模型的模式

1 2	# eval()：将模型切换到评估模式，冻结 dropout 和 batch normalization 等层的行为，以确保一致性。 resnet.eval()

在训练和评估时，某些层（如 dropout 和 batch normalization）会有不同的行为，eval() 方法确保它们在预测阶段保持一致。

获取模型输出并计算预测类别

# resnet(batch_t)：将预处理后的图像张量传入模型，得到输出向量 out。
# torch.max(out, 1)：沿输出的每行（类别维度）找到概率最大的值，其索引即为预测类别的下标 index。
out = resnet(batch_t)
_, index = torch.max(out, 1)

ResNet 输出的是一个长度为1000的张量，每个值对应一个类别的“未归一化”概率。torch.max 用于提取模型认为最可能的类别。

使用Softmax获得类别的百分比

1
2
3

# softmax：将模型输出的各类别值转换为概率，使它们的总和为1，得到各类别的相对可能性。
# [0] * 100：提取该图像的所有类别的百分比值。
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100

Softmax 层将向量值转化为概率分布，用于多分类问题的输出，确保总概率为100%。

获取ImageNet标签并打印模型预测的标签和百分比

# ResNet101_Weights.IMAGENET1K_V1.meta["categories"]：获取 ImageNet 数据集的类别标签。
# predicted_label：通过 index 索引找到预测类别的标签。
# print(...)：打印出预测的标签和对应的概率百分比。
labels = ResNet101_Weights.IMAGENET1K_V1.meta["categories"]
predicted_label = labels[index[0]]
print(f"Predicted label: {predicted_label}, Confidence: {percentage[index[0]].item():.2f}%")

ResNet101_Weights.IMAGENET1K_V1 中的 meta 提供类别标签字典，模型输出的类别索引对应到实际的标签名称。

概率分布前五的label

完整代码实现

import torch
from torchvision import models, transforms
from torchvision.models import ResNet101_Weights
from PIL import Image

# 加载ResNet模型
resnet = models.resnet101(weights=ResNet101_Weights.IMAGENET1K_V1)

# 图像预处理
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# 加载和预处理图片
img = Image.open("./Resnet/Image/image10.jpg")
img_t = preprocess(img)
batch_t = torch.unsqueeze(img_t, 0)

# 设置模型为评估模式
resnet.eval()

# 获取模型输出
out = resnet(batch_t)
_, index = torch.max(out, 1)

# 使用Softmax获得类别的百分比
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100

# 获取ImageNet标签
labels = ResNet101_Weights.IMAGENET1K_V1.meta["categories"]

# 打印模型预测的标签和百分比
predicted_label = labels[index[0]]
print(f"Predicted label: {predicted_label}, Confidence: {percentage[index[0]].item():.2f}%")

Jupyter脚本文件下载