CMAR代码学习1 生成FP-tree挖掘规则https://blog.csdn.net/woshiwu6666/article/details/124717377
有一个java语法的知识点,通过赋值给属性,会自动调用toString方法
要特别注意的是,下面这个例子当中,要写成return "Animal [i=" + this.i + ", o=" + this.o +",kafang=" +(this.kafang=(NUMBER_INSTANCES/3))+"]";有this指针toString()方法自动被调用,并且有值更新的时候toString()里的值也会默认更新。如果写成,return "Animal [i=" + i + ", o=" + o +",kafang=" +(kafang=(NUMBER_INSTANCES/3))+"]";则toString()方法不能被自动调用,要使用System.out.println();才能被调用
public class Animal {
int i=1;
String o="hello";
double kafang=0;
static double NUMBER_INSTANCES=0;
public void printType() {
System.out.println("Cat");
}
@Override
public String toString() {
return "Animal [i=" + this.i + ", o=" + this.o +",kafang=" +(this.kafang=(NUMBER_INSTANCES/3))+"]";
}
public static void main(String[] args) {
Animal dog=new Animal();
//通过赋值,会调用toString方法
dog.i=2;
System.out.println(dog);
//更新值,toString方法也会更新
// dog.NUMBER_INSTANCES=9;
//通过赋值,会调用toString方法,自动计算了卡方值
Animal.NUMBER_INSTANCES=9;
// dog.i=2;
System.out.println("kafang="+dog.kafang);
System.out.println(dog);
}
}
卡方值的计算
AlgoCMAR.java
生成这个类,new ClassifierCMAR(rules, dataset, delta);,利用该类计算卡方值并利用CR-tree剪枝
/**
* Train a classifier
* @param dataset a training dataset
* @return a rule classifier
* @throws Exception if an error occurs
*/
@Override
public ClassifierCMAR train(Dataset dataset){
// Apply a modified FPGrowth algorithm to obtain the rules
FPGrowthForCMAR fpgrowth = new FPGrowthForCMAR(dataset, minSup, minConf);
List rules = fpgrowth.run();
// Return a classifier that is created using these rules
return new ClassifierCMAR(rules, dataset, delta);
}
ClassifierCMAR.java
NUMBER_INSTANCES表示数据的总条数
/**
* Constructor
*
* @param rules forming the current classifier
* @param training dataset used while training classifier
* @param delta delta
*
* @throws Exception
*/
public ClassifierCMAR(List rules, Dataset training, int delta) {
super("CMAR");
//传了一个数进去,就可以计算卡方值
RuleCMAR.NUMBER_INSTANCES = training.getInstances().size();
// System.out.println(1);
//一共有多少个属性值
CRTree.NUMBER_SINGLETONS = training.getDistinctItemsCount();
CRTree crTree = new CRTree(training, delta);
for (Rule rule : rules) {
crTree.insert(rule);
}
//它为什么把那些size并且优先级高的都去掉了
crTree.pruneUsingCover();
this.rules = crTree.getRules();
// System.out.println(1);
}
Rule.java
属性有更新,会自动调用toString()方法
/*
* (non-Javadoc)
*
* @see java.lang.Object#toString()
*/
public String toString() {
// System.out.println(1);
return this.antecedent.toString() + " -> " + this.getKlass() + getMeasuresToString();
}
public abstract String getMeasuresToString();
RuleCMAR.java
@Override
public String getMeasuresToString() {
return " #SUP: " + getSupportRule() + " #CONF: " + getConfidence() + " #CHISQUARE: " + this.getChiSquare();
}
计算卡方值
假设有这么一条规则,计算它的卡方值
- 生成实际表格与期望表格
2.利用公式计算,得到结果
调用getChiSquare()函数进行计算
以下面的两条规则为例,解释SupportKlass,SupportAntecedent,SupportRule以及CONF
SupportKlass: 分类为class=12的数量
SupportAntecedent:条件部分的出现次数
SupportRule: 这些条件判断为12的次数
CONF: SupportRule/SupportAntecedent
[5] -> 12 #SupportRule: 1#SupportAntecedent: 1#SupportKlass: 6 #CONF: 1.0
[5, 8] -> 12 #SupportRule: 1#SupportAntecedent: 1#SupportKlass: 6 #CONF: 1.0
/**
* Get the chi square of this rule
*
* @return chi square of this rule
*/
public double getChiSquare() {
double[] observedValues = new double[4];
double[] expectedValues = new double[4];
// Calculate observed and expected values
observedValues[0] = this.supportRule;
observedValues[1] = this.supportAntecedent - this.supportRule;
observedValues[2] = this.supportKlass - this.supportRule;
observedValues[3] = NUMBER_INSTANCES - this.supportAntecedent - this.supportKlass + this.supportRule;
// Calculate additional support values
double supNotAntecedent = NUMBER_INSTANCES - this.supportAntecedent;
double supNotConsequent = NUMBER_INSTANCES - this.supportKlass;
// Expected values
expectedValues[0] = (this.supportKlass * this.supportAntecedent) / NUMBER_INSTANCES;
expectedValues[1] = (supNotConsequent * this.supportAntecedent) / NUMBER_INSTANCES;
expectedValues[2] = (this.supportKlass * supNotAntecedent) / NUMBER_INSTANCES;
expectedValues[3] = (supNotConsequent * supNotAntecedent) / NUMBER_INSTANCES;
double sumChiSquaredValues = 0.0;
for (int index = 0; index < observedValues.length; index++) {
double chiValue = Math.pow((observedValues[index] - expectedValues[index]), 2.0) / expectedValues[index];
sumChiSquaredValues = sumChiSquaredValues + chiValue;
}
return sumChiSquaredValues;
}
下面展示计算的输出
[5] -> 12 #SupportRule: 1#SupportAntecedent: 1#SupportKlass: 6 #CONF: 1.0
observedValues[0]1.0
observedValues[1]0.0
observedValues[2]5.0
observedValues[3]3.0
supNotAntecedent8.0
supNotConsequent3.0
expectedValues[0]0.6666666666666666
expectedValues[1]0.3333333333333333
expectedValues[2]5.333333333333333
expectedValues[3]2.6666666666666665
chiValue0.1666666666666667
chiValue0.3333333333333333
chiValue0.020833333333333297
chiValue0.041666666666666706
sumChiSquaredValues0.5625
[5, 8] -> 12 #SupportRule: 1#SupportAntecedent: 1#SupportKlass: 6 #CONF: 1.0
observedValues[0]1.0
observedValues[1]0.0
observedValues[2]5.0
observedValues[3]3.0
supNotAntecedent8.0
supNotConsequent3.0
expectedValues[0]0.6666666666666666
expectedValues[1]0.3333333333333333
expectedValues[2]5.333333333333333
expectedValues[3]2.6666666666666665
chiValue0.1666666666666667
chiValue0.3333333333333333
chiValue0.020833333333333297
chiValue0.041666666666666706
sumChiSquaredValues0.5625
Yes表示是5,No表示非
判断为12 | 判断为11 | total | |
[5]Yes | 1 | 0=1-0 | 1 |
[5]No | 5=6-1 | 3=9-5-1 | 8 |
total | 6 | 3 | 9 |
判断为12 | 判断为11 | total | |
[5, 8]Yes | 1 | 0=1-0 | 1 |
[5, 8]No | 5=6-1 | 3=9-5-1 | 8 |
total | 6 | 3 | 9 |
下一 步是创建CR-Tree
- 为CR-Tree赋值
- 调用 crTree.insert( )方法插入规则
- 调用crTree.pruneUsingCover( )过滤掉不符合规则的方法
- 得到过滤后的规则crTree.getRules( )
minCover:是database coverage的参数
第一步为CR-tree赋值
CRTree(Dataset dataset, int delta) {
minCover = delta;
this.dataset = dataset;
}
第二步调用insert()方法
/**
* Insert a rule in the tree
*
* @param baseRule a rule to be inserted into the CRTree
*/
protected void insert(Rule baseRule) {
RuleCMAR rule = (RuleCMAR) baseRule;
// If the rule fails the Chi-Squared test
if (rule.getChiSquare() <= THRESHOLD_CHI_SQUARE) {
// We dont add the rule
return;
}
// Otherwise, we create a new node for this rule
CRNode newNode = new CRNode(rule);
// If it is the first rule, it will be added as the root of the tree.
//添加到树的rootnode当中
if (rootNode == null) {
rootNode = newNode;
return;
}
// If more general rule with higher ranking exists, current rule will be
// discarded
//如果两项都满足就discarded,只要有一项不满足就可以添加
if (isMoreGeneralNode(newNode)) {
return;
}
// Add current node as the first node
//和这颗树的首节点比较,如果比它优先级高就添加它到树的下一个节点当中然后返回
//如果比这颗树的头节点大
if (newNode.rule.isGreater(rootNode.rule)) {
newNode.next = rootNode;
rootNode = newNode;
return;
}
// Search for the position where
// the rule should be inserted in terms of rank
//则要找到合适的位置的位置添加
CRNode currentNode = rootNode;
CRNode nextNode = rootNode.next;
while (nextNode != null) {
if (newNode.rule.isGreater(nextNode.rule)) {
currentNode.next = newNode;
newNode.next = nextNode;
return;
}
currentNode = nextNode;
nextNode = nextNode.next;
}
// Add new node at the very end
//整棵树里都没有比它小的就添加到最后,最小的放到最后
currentNode.next = newNode;
}
首先,有一个THRESHOLD_CHI_SQUARE 的筛选,默认阈值1.6424。不满足阈值,则不放进这颗树中
/**
* Critical threshold for 25% "significance" level (assuming "degree of freedom"
* equivalent to 1).
*/
private static final double THRESHOLD_20 = 1.6424;
/**
* Critical threshold value for CHI_SQUARE
*/
private static final double THRESHOLD_CHI_SQUARE = THRESHOLD_20; // Default
否则放进这颗树中,这一棵单路径的树,其数据结构如下所示
CRNode.java
/**
* Class representing a CRTree node, as used by the CMAR algorithm.
*
* @see AlgoCMAR
*/
public class CRNode {
/**
* Rule stored in the current node
*/
RuleCMAR rule = null;
/**
* Link to the next node
*/
CRNode next = null;
/**
* Constructor
*
* @param rule to be stored in the node
*/
CRNode(RuleCMAR rule) {
this.rule = rule;
}
}
接着判断规则是不是general和high confidence的,用general和high confidence的去覆盖specific和low confidence的
/**
* Checks whether there are a more general rule, with higher ranking in the
* current tree
*
* @param ruleNode a node to be inserted
* @return true if more general rule exists in the tree
*/
private boolean isMoreGeneralNode(CRNode ruleNode) {
CRNode currentNode = rootNode;
// Search in tree by follwing the "next" links between nodes
while (currentNode != null) {
//如果这两个都满足就discard
//如果size更小,并且优先级更高,就discard,不用放到树当中,因为都满足的不需要修剪
//只满足一个的才需要修剪
if (ruleNode.rule.isMoreGeneral(currentNode.rule) && ruleNode.rule.isGreater(currentNode.rule))
{
// System.out.println(1);
// System.out.println(ruleNode.rule);
return true;
}
currentNode = currentNode.next;
}
return false;
}
如何算是more generalisMoreGeneralNode(newNode),原文中是怎么描述的
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)