学习来源:日撸 Java 三百行(51-60天,kNN 与 NB)_闵帆的博客-CSDN博客
- 基于 M-distance 的推荐
- 一、算法概述
- 二、算法特点
- 三、算法图解
- 四、代码分析
- 1. 数据集分析
- 2. 完整代码
- 3. 运行截图
- 总结
所谓 M-distance, 就是根据平均分来计算两个用户 (或项目) 之间的距离.
令项目的 j j j 的平均分为 x . j x_{.j} x.j
采用 item-based recommendation, 则第 j j j 个项目关于第 i i i 个用户的邻居项目集合为
N i j = { 1 ≤ j ′ ≤ m ∣ j ′ ≠ j , p i j ′ ≠ 0 , ∣ r . j ‾ − r . j ′ ‾ ∣ < δ } N_{ij} = \{ 1 \le {j}' \le m | {j}' \neq j, p_{i{j}'} \neq 0, \left | \overline {r_{.j}} - \overline {r_{.{j}'}} \right | < \delta \} Nij={1≤j′≤m∣j′=j,pij′=0,∣r.j−r.j′∣<δ}
第 i i i 个用户对 j j j 个项目的评分预测为
p i j = ∑ j ′ ∈ N i j r i j ′ ∣ N i j ∣ p_{ij} = \frac{\textstyle \sum_{{j}' \in N_{ij}}r_{i{j}'}} {\left | N_{ij} \right |} pij=∣Nij∣∑j′∈Nijrij′
二、算法特点邻居不用 k k k 控制. 距离小于 radius ( 即 δ \delta δ ) 的都是邻居. 使用 M-distance 时, 这种方式效果更好.
使用 leave-one-out 的测试方式, 很高效的算法才能使用这种方式. 这种算法的特点就是只用一个做为测试, 其他全部用于训练.
三、算法图解u i ( i = 0 , 1 , 2 , 3 , 4 ) u_i (i=0,1,2,3,4) ui(i=0,1,2,3,4) 表示不同用户, m i ( i = 0 , 1 , 2 , 3 , 4 ) m_i (i=0,1,2,3,4) mi(i=0,1,2,3,4) 表示不同电影.
表格中的数字表示不同用户对不同电影的评分.
n u m num num 表示对于某一个电影已有数据的个数, s u m sum sum 表示对某个电影已有数据的总和.
r ‾ \overline{r} r 表示平均评分, 即 ( s u m n u m \frac{sum}{num} numsum)
图中 0 占大多数, 表示还未能看过电影. 我们就是需要对这些数据进行预测.
在预测时只取差值小于 δ \delta δ 的那一列数据. 如果列中对需推荐用户的值为 0 则不带入计算.
四、代码分析 1. 数据集分析评分表 (用户, 项目, 评分) 的压缩方式给出
前几行数据为:
0,0,5
0,1,3
0,2,4
0,3,3
0,4,3
0,5,5
0,6,4
…
1,0,4
1,9,2
1,12,4
其中, “0,2,4” 表示用户 0 对项目 2 的评分为 4. 用户 1 对项目 1、2 等的评分没有, 表示没看过该电影. 在用户数、项目数很多时, 必须使用压缩存储.
2. 完整代码package knn;
import java.io.*;
/**
* Recommendation with M-distance.
*
* @author Shihuai Wen Email: shihuaiwen@outlook.com.
*/
public class MBR {
/**
* Default rating for 1-5 points.
*/
public static final double DEFAULT_RATING = 3.0;
/**
* The total number of users.
*/
private int numUsers;
/**
* The total number of items.
*/
private int numItems;
/**
* The total number of ratings (non-zero values)
*/
private int numRatings;
/**
* The predictions.
*/
private double[] predictions;
/**
* Compressed rating matrix. User-item-rating triples.
*/
private int[][] compressedRatingMatrix;
/**
* The degree of users (how many item he has rated).
*/
private int[] userDegrees;
/**
* The average rating of the current user.
*/
private double[] userAverageRatings;
/**
* The degree of users (how many item he has rated).
*/
private int[] itemDegrees;
/**
* The average rating of the current item.
*/
private double[] itemAverageRatings;
/**
* The first user start from 0. Let the first user has x ratings, the second
* user will start from x.
*/
private int[] userStartingIndices;
/**
* Number of non-neighbor objects.
*/
private int numNonNeighbors;
/**
* The radius (delta) for determining the neighborhood.
*/
private double radius;
/**
* ************************
* Construct the rating matrix.
*
* @param paraFilename the rating filename.
* @param paraNumUsers number of users
* @param paraNumItems number of items
* @param paraNumRatings number of ratings
* ************************
*/
public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
// Step 1. Initialize these arrays
numItems = paraNumItems;
numUsers = paraNumUsers;
numRatings = paraNumRatings;
userDegrees = new int[numUsers];
userStartingIndices = new int[numUsers + 1];
userAverageRatings = new double[numUsers];
itemDegrees = new int[numItems];
compressedRatingMatrix = new int[numRatings][3];
itemAverageRatings = new double[numItems];
predictions = new double[numRatings];
System.out.println("Reading " + paraFilename);
// Step 2. Read the data file.
File tempFile = new File(paraFilename);
if (!tempFile.exists()) {
System.out.println("File " + paraFilename + " does not exists.");
System.exit(0);
} // Of if
BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
String tempString;
String[] tempStrArray;
int tempIndex = 0;
userStartingIndices[0] = 0;
userStartingIndices[numUsers] = numRatings;
while ((tempString = tempBufReader.readLine()) != null) {
// Each line has three values
tempStrArray = tempString.split(",");
compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);
userDegrees[compressedRatingMatrix[tempIndex][0]]++;
itemDegrees[compressedRatingMatrix[tempIndex][1]]++;
if (tempIndex > 0) {
// Starting to read the data of a new user.
if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
} // Of if
} // Of if
tempIndex++;
} // Of while
tempBufReader.close();
double[] tempUserTotalScore = new double[numUsers];
double[] tempItemTotalScore = new double[numItems];
for (int i = 0; i < numRatings; i++) {
tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];
} // Of for i
for (int i = 0; i < numUsers; i++) {
userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
} // Of for i
for (int i = 0; i < numItems; i++) {
itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
} // Of for i
}// Of the first constructor
/**
* ************************
* Set the radius (delta).
*
* @param paraRadius The given radius.
* ************************
*/
public void setRadius(double paraRadius) {
if (paraRadius > 0) {
radius = paraRadius;
} else {
radius = 0.1;
} // Of if
}// Of setRadius
/**
* ************************
* Leave-one-out prediction. The predicted values are stored in predictions.
*
* @see #predictions
* ************************
*/
public void leaveOneOutPrediction() {
double tempItemAverageRating;
// Make each line of the code shorter.
int tempUser, tempItem, tempRating;
System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);
numNonNeighbors = 0;
for (int i = 0; i < numRatings; i++) {
tempUser = compressedRatingMatrix[i][0];
tempItem = compressedRatingMatrix[i][1];
tempRating = compressedRatingMatrix[i][2];
// Step 1. Recompute average rating of the current item.
tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
/ (itemDegrees[tempItem] - 1);
// Step 2. Recompute neighbors, at the same time obtain the ratings
// Of neighbors.
int tempNeighbors = 0;
double tempTotal = 0;
int tempComparedItem;
for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
tempComparedItem = compressedRatingMatrix[j][1];
if (tempItem == tempComparedItem) {
continue;// Ignore itself.
} // Of if
if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
tempTotal += compressedRatingMatrix[j][2];
tempNeighbors++;
} // Of if
} // Of for j
// Step 3. Predict as the average value of neighbors.
if (tempNeighbors > 0) {
predictions[i] = tempTotal / tempNeighbors;
} else {
predictions[i] = DEFAULT_RATING;
numNonNeighbors++;
} // Of if
} // Of for i
}// Of leaveOneOutPrediction
/**
* ************************
* Compute the MAE based on the deviation of each leave-one-out.
*
* @author Shihuai Wen
* ************************
*/
public double computeMAE() {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
} // Of for i
return tempTotalError / predictions.length;
}// Of computeMAE
/**
* ************************
* Compute the MAE based on the deviation of each leave-one-out.
*
* @author Shihuai Wen
* ************************
*/
public double computeRSME() {
double tempTotalError = 0;
for (int i = 0; i < predictions.length; i++) {
tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
* (predictions[i] - compressedRatingMatrix[i][2]);
} // Of for i
double tempAverage = tempTotalError / predictions.length;
return Math.sqrt(tempAverage);
}// Of computeRSME
/**
* ************************
* The entrance of the program.
*
* @param args Not used now.
* ************************
*/
public static void main(String[] args) {
try {
MBR tempRecommender = new MBR("D:/Work/sampledata/movielens-943u1682m.txt", 943, 1682, 100000);
for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
tempRecommender.setRadius(tempRadius);
tempRecommender.leaveOneOutPrediction();
double tempMAE = tempRecommender.computeMAE();
double tempRSME = tempRecommender.computeRSME();
System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME = " + tempRSME
+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
} // Of for tempRadius
} catch (Exception ee) {
System.out.println("Error occurred in main \r\n" + ee);
} // Of try
}// Of main
}// Of class MBR
3. 运行截图
总结
大道至简, 越是简单的算法可能会解决更多的问题.
不过代码中的数组太过于多, 阅读起来存在一定的障碍.
运行结果中出现了 0.1 + 0.2 = 0.30000000000000004 , 浮点数的奥秘.
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)