这次尝试在配置PyTorch geometric的时候出现了一些问题,常见的依赖包安装命令以及官网命令都出现了一些错误,于是在同学的帮助下更换了源完成了安装:
(在Anaconda prompt中实现)
pip install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.9.0+cpu.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.9.0+cpu.html
pip install torch-cluster -f https://data.pyg.org/whl/torch-1.9.0+cpu.html
pip install torch-spline-conv -f https://data.pyg.org/whl/torch-1.9.0+cpu.html
pip install torch-geometric
这次也意识到了安装虚拟环境的重要性,在以后的学习机器学习的过程中会更加注重这些。
一 图卷积基础原理二 PyTorch geometric(PYG)
PyTorch Geometric已经包含有很多常见的基准数据集,包括:
- Cora:一个根据科学论文之间相互引用关系而构建的Graph数据集合,论文分为7类:Genetic_Algorithms,Neural_Networks,Probabilistic_Methods,Reinforcement_Learning,Rule_Learning,Theory,共2708篇;
- Citeseer:一个论文之间引用信息数据集,论文分为6类:Agents、AI、DB、IR、ML和HCI,共包含3312篇论文;
- Pubmed:生物医学方面的论文搜寻以及摘要数据集。
import torch import torch.nn.functional as F from torch_geometric.nn import MessagePassing from torch_geometric.utils import add_self_loops, degree # dataset from torch_geometric.datasets import Planetoid dataset = Planetoid(root='/tmp/Cora', name='Cora') class GCNConv(MessagePassing): def __init__(self, in_channels, out_channels): super(GCNConv, self).__init__(aggr='add') self.lin = torch.nn.Linear(in_channels, out_channels) def forward(self, x, edge_index): # 1: 增加自连接到邻接矩阵 edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0)) # 2: 对节点的特征矩阵进行线性变换 x = self.lin(x) # 3-5: Start propagating messages. return self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x) def message(self, x_j, edge_index, size): # Step 3: Normalize node features. row, col = edge_index deg = degree(row, size[0], dtype=x_j.dtype) deg_inv_sqrt = deg.pow(-0.5) norm = deg_inv_sqrt[row] * deg_inv_sqrt[col] return norm.view(-1, 1) * x_j def update(self, aggr_out): # Step 5: Return new node embeddings. return aggr_out class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = GCNConv(dataset.num_node_features, 16) self.conv2 = GCNConv(16, dataset.num_classes) def forward(self, data): x, edge_index = data.x, data.edge_index x = self.conv1(x, edge_index) x = F.relu(x) x = F.dropout(x, training=self.training) x = self.conv2(x, edge_index) return F.log_softmax(x, dim=1) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = Net().to(device) data = dataset[0].to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4) model.train() for epoch in range(200): optimizer.zero_grad() out = model(data) loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() model.eval() _, pred = model(data).max(dim=1) correct = float(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item()) acc = correct / data.test_mask.sum().item() print('Accuracy: {:.4f}'.format(acc))
由于数据集需要去github上下载,但是网络比较不稳定,出现了runtimeout报错:
planetoid.py里面第48行:
url = 'https://github.com/kimiyoung/planetoid/raw/master/data'
改成 url='https://gitee.com/jiajiewu/planetoid/raw/master/data'
(gitee国内更容易访问)
最后运行结果为:
参考资料:
一文读懂图卷积GCN - 知乎
图神经网络库PyTorch geometric(PYG)零基础上手教程 - 知乎
Planetoid无法直接下载Cora等数据集的3个解决方式
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)