如何将数据转换libsvm格式文件_软件运维

使用FormatDataLibsvm.xls。其实这个软件就是一个excel工作薄，先把office的宏安全性应当设置为中或者低。打开它就会跳出来一个对话框，选“启用宏”就行了，其他不用管它，把你要的数据拷上去就好了。

1，先运行FormatDataLibsvm.xls然后将数据粘贴到sheet1的topleft单元。

2，再"工具"-->"宏"-->执行下面有一个选项（FormatDatatoLibsvm）-->执行，要选中这个然后运行就可以了，这时数据转换的问题就解决了,可是现在生成的数据是.xls格式，它还是不能做为libsvm的训练文件啊!还需要怎么转换呢?

3，可以copy到一个记事本中即可。但是注意在用libsvm的时候要在命令行输入.txt后缀。

输入数据的格式是

条件属性a 条件属性b ... 决策属性

7 5 ... 2

4 2 ... 1

输出数据格式是

决策属性条件属性a 条件属性b ...

2 1:7 2:5 ...

1 1:4 2:2 ...

P.S. 在第2步时，执行下面还有另一个选项（FormatDatafromLibsvm）这个可以把libsvm数据的格式重新转回来，转到你第1步时刚拷贝到excel里的数据形式。

或者自己写个MATLAB的程序，将自己常用的数据格式按照这种数据格式要求转换成这种格式供LIBSVM直接使用。

格式转换函数write2libsvm如下：

function write2libsvm

% 为了使得数据满足libsvm的格式要求而进行的数据格式转换

% 原始数据保存格式为:

% [标签第一个属性值第二个属性值...]

% 转换后文件格式为满足libsvm的格式要求，即:

% [标签 1:第一个属性值 2:第二个属性值 3:第三个属性值 ...]

[filename, pathname] = uigetfile( {'*.mat', ...

'数据文件(*.mat)''*.*', '所有文件 (*.*)'}, '选择数据文件')

try

S=load([pathname filename])

fieldName = fieldnames(S)

str = cell2mat(fieldName)

B = getfield(S,str)

[m,n] = size(B)

[filename, pathname] = uiputfile({'*.txt*.dat' ,'数据文件

(*.txt*.dat)''*.*','所有文件 (*.*)'},'保存数据文件')

fid = fopen([pathname filename],'w')

if(fid~=-1)

for k=1:m

fprintf(fid,'%3d',B(k,1))

for kk = 2:n

fprintf(fid,'\t%d',(kk-1))

fprintf(fid,':')

fprintf(fid,'%d',B(k,kk))

end

fprintf(fid,'\n')

end

fclose(fid)

else

msgbox('无法保存文件!')

end

catch

msgbox('文件保存过程中出错!','出错了...','error')

end

一：libsvm包下载与使用：

LIBSVM是台湾大学林智仁(Lin Chih-Jen)副教授等开发设计的一个简单、易于使用和快速有效的SVM模式识别与回归的软件包，他不但提供了编译好的可在Windows系列系统的执行文件，还提供了源代码，方便改进.

把包解压在C盘之中，如：C:\libsvm-3.18

因为要用libsvm自带的脚本grid.py和easy.py,需要去官网下载绘图工具gnuplot,解压到c盘

进入c:\libsvm\tools目录下，用文本编辑器（记事本，edit都可以）修改grid.py和easy.py两个文件，找到其中关于gnuplot路径的那项，根据实际路径进行修改，并保存

4python与libsvm的连接（参考SVM学习笔记（2）LIBSVM在python下的使用）

a.打开IDLE(python GUI)，输入

>>>import sys

>>>sys.version

如果你的python是32位，将出现如下字符：

‘2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]’

这个时候LIBSVM的python接口设置将非常简单。在libsvm-3.16文件夹下的windows文件夹中找到动态链接库libsvm.dll，将其添加到系统目录，如`C:\WINDOWS\system32\’，即可在python中使用libsvm

b.如果你是64位的请参考文献，请参考上述连接。

5.执行一个小例子

import os

os.chdir('C:\libsvm-3.18\python')#请根据实际路径修改

from svmutil import *

y, x = svm_read_problem('../heart_scale')#读取自带数据

m = svm_train(y[:200], x[:200], '-c 4')

p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m)

##出现如下结果，应该是正确安装了

optimization finished, #iter = 257

nu = 0.351161

obj = -225.628984, rho = 0.636110

nSV = 91, nBSV = 49

Total nSV = 91

Accuracy = 84.2857% (59/70) (classification)

二几个简单的例子

从下载实验数据集。并且将数据集拷贝到C:\libsvm-3.18\windows下（因为之后我们需要利用该文件夹下的其他文件，这样比较方便，当然之后你用绝对地址也可以了）

建立一个py文件，写下如下代码：

例1：

import os

os.chdir('C:\libsvm-3.18\windows')#设定路径

from svmutil import *

y, x = svm_read_problem('train.1.txt')#读入训练数据

yt, xt = svm_read_problem('test.1.txt')#训练测试数据

m = svm_train(y, x )#训练

svm_predict(yt,xt,m)#测试

执行上述代码，精度为：Accuracy = 66.925% (2677/4000) (classification)

常用接口

svm_train() : train an SVM model#训练

svm_predict() : predict testing data#预测

svm_read_problem() : read the data from a LIBSVM-format file.#读取libsvm格式的数据

svm_load_model() : load a LIBSVM model.

svm_save_model() : save model to a file.

evaluations() : evaluate prediction results.

- Function: svm_train#三种训练写法

There are three ways to call svm_train()

>>>model = svm_train(y, x [, 'training_options'])

>>>model = svm_train(prob [, 'training_options'])

>>>model = svm_train(prob, param)

有关参数的设置（read me 文件夹中有详细说明）：

Usage: svm-train [options] training_set_file [model_file]

options:

-s svm_type : set type of SVM (default 0)#选择哪一种svm

0 -- C-SVC (multi-class classification)

1 -- nu-SVC (multi-class classification)

2 -- one-class SVM

3 -- epsilon-SVR (regression)

4 -- nu-SVR (regression)

-t kernel_type : set type of kernel function (default 2)#是否用kernel trick

0 -- linear: u'*v

1 -- polynomial: (gamma*u'*v + coef0)^degree

2 -- radial basis function: exp(-gamma*|u-v|^2)

3 -- sigmoid: tanh(gamma*u'*v + coef0)

4 -- precomputed kernel (kernel values in training_set_file)

-d degree : set degree in kernel function (default 3)

-g gamma : set gamma in kernel function (default 1/num_features)

-r coef0 : set coef0 in kernel function (default 0)

-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)

-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)

-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)

-m cachesize : set cache memory size in MB (default 100)

-e epsilon : set tolerance of termination criterion (default 0.001)

-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)

-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)

-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)

-v n: n-fold cross validation mode

-q : quiet mode (no outputs)

三提高预测的准确率：

通过一定的过程，可以提高预测的准确率(在文献2中有详细介绍)：

a.转换数据为libsvm可用形式.(可以通过下载的数据了解格式）

b.进行一个简单的尺度变换

c.利用RBF kernel，利用cross-validation来查找最佳的参数 C 和 r

d.利用最佳参数C 和 r ，来训练整个数据集

e.测试

再看例子1：

1.进入cmd模式下，输入如下代码，将现有数据进行适度变换，生成变换后的数据文件train.1.scale.txt

参数说明：

-l 变换后的下限

-u 变换后的上限

-s 参考上文

2执行以下代码

import os

os.chdir('C:\libsvm-3.18\windows')#设定路径

from svmutil import *

y, x = svm_read_problem('train.1.scale.txt')#读入训练数据

yt, xt = svm_read_problem('test.1.scale.txt')#训练测试数据

m = svm_train(y, x )#训练

svm_predict(yt,xt,m)#测试

精确度为Accuracy = 95.6% (3824/4000) (classification)。

可见我们只是做了简单的尺度变换后，预测的正确率大大提升了。

3通过选择最优参数，再次提高预测的准确率：（需要把tools文件下的grid.py拷贝到'C:\libsvm-3.18\windows'下）

import os

os.chdir('C:\libsvm-3.18\windows')#设定路径

from svmutil import *

from grid import *

rate, param = find_parameters('train.1.scale.txt', '-log2c -3,3,1 -log2g -3,3,1')

y, x = svm_read_problem('train.1.scale.txt')#读入训练数据

yt, xt = svm_read_problem('test.1.scale.txt')#训练测试数据

m = svm_train(y, x ,'-c 2 -g 4')#训练

p_label,p_acc,p_vals=svm_predict(yt,xt,m)#测试

执行上面的程序，find_parmaters函数，可以找到对应训练数据较好的参数。后面的log2c,log2g分别设置C和r的搜索范围。搜索机制是以2为底指数搜索，如 –log2c –3 , 3,1 就是参数C,从2^-3，2^-2，2^-1…搜索到2^3.

搜索到较好参数后，在训练的时候加上参数的设置。

另外，读者可以自己试试数据集2,3.

一．下载libsvm

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

在libsvm的网站上下载 libsvm-3.12.zip文件，解压后放在任意目录下，最好放在MATLAB工具箱中，比如 C:\Program Files\MATLAB\R2011a\toolbox\libsvm-3.12下。

二．配置编译器

打开 matlab，切换到C:\Program Files\MATLAB\R2011a\toolbox\libsvm-3.12\matlab目录下，键入以下命令：

mex –setup

出现提示语句

Please choose your compiler for building MEX-files:

Would you like mex to locate installed compilers [y]/n?n %这次是选择编译器，输入n，选择自定义的编译器

出现以下选项（因电脑而异）

Select a compiler:

[1] Intel C++ 11.1 (with Microsoft Visual C++ 2008 SP1 linker)

[2] Intel Visual Fortran 11.1 (with Microsoft Visual C++ 2008 SP1 linker)

[3] Intel Visual Fortran 11.1 (with Microsoft Visual C++ 2008 Shell linker)

[4] Lcc-win32 C 2.4.1

[5] Microsoft Visual C++ 6.0

[6] Microsoft Visual C++ 2005 SP1

[7] Microsoft Visual C++ 2008 SP1

[8] Microsoft Visual C++ 2010

[9] Microsoft Visual C++ 2010 Express

[10] Open WATCOM C++

[0] None

Compiler: 8%可以用其他的，出现以下提示语句

Your machine has a Microsoft Visual C++ 2010 compiler located at

C:\Program Files\Microsoft Visual Studio 10.0. Do you want to use this compiler [y]/n?

编译器默认路径，确认正确输入y，更改路径，输入n

输入y出现再次确认

Please verify your choices:

Compiler: Microsoft Visual C++ 2010

Location: C:\Program Files\Microsoft Visual Studio 10.0

Are these correct [y]/n? y

编译器配置完成

Trying to update options file: C:\Documents and Settings\zhangduokun\Application Data\MathWorks\MATLAB\R2011a\mexopts.bat

From template: C:\PROGRA~1\MATLAB\R2011a\bin\win32\mexopts\msvc100opts.bat

Done . . .

三．编译

输入命令

>>make

%编译完成

系统就会生成svmtrain.mexw32，svmpredict.mexw32，libsvmread.mexw32和libsvmwrite.mexw32等文件（对于 Matlab 7.1以下上版本，生成的对应文件为svmtrain.dll，svmpredict.dll和read_sparse.dll，没做测试），然后可以在matlab的菜单 File->Set Path->add with subfolders（可直接用Add Folder）里，把 C:\Program Files\MATLAB\R2011a\toolbox\libsvm-3.12\matlab目录添加进去，这样以后在任何目录下都可以调用 libsvm的函数了。

四．测试

为了检验 libsvm和 matlab之间的接口是否已经配置完成，可以在 matlab下执行以下命令：

>>load heart_scale

完成该步骤后发现Workspace中出现了heart_scale_inst和 heart_scale_label,说明正确

>>model = svmtrain(heart_scale_label, heart_scale_inst, '-c 1 -g 0.07')

>>[predict_label, accuracy, dec_values] = svmpredict(heart_scale_label, heart_scale_inst, model)%

Accuracy = 86.6667% (234/270) (classification)% done

如果运行正常并生成了model这个结构体（其中保存了所有的支持向量及其系数），那么说明libsvm和matlab 之间的接口已经完全配置成功。

附：

如果你没有 heart_scale.mat（官方现在都不给了，给的都是VC++下的格式所以load 会报错：heart_scale must be same as previous lines）

所以必须使用给的一个函数，转化数据此函数为libsvmread()

使用如下： [label_vector, instance_matrix] = libsvmread('filename')

此处为了跟官方统一名称可以[heart_scale_label,heart_scale_inst] = libsvmread('heart_scale')

由于heart_scale在libsvm-3.11目录下，不是在matlab下，所以直接用libsvmread命令会报错，要买改变当前路径，或者使用[heart_scale_label,heart_scale_inst] = libsvmread('../heart_scale')../代表返回上层路径。

注意：

1. matlab自带了C编译器Lcc-win32C，但是libsvm原始版本是C++实现的，因此需要C++的编译器来编译，这就是不适用matlab默认编译器而选择其他C++编译器的原因。

matlab支持的编译器也是有限的，可以查看不同版本matlab支持的编译器列表

2. 如果matlab版本太低，如matlab 7.0是不能用VS作为编译器的，只能用VC++ 6.0

3. .mexw32 文件是经过加密的，打开是乱码，函数本身没有帮助。

例如输入 help svmpredict会出现报错： svmpredict not found

工具箱libsvm-3.12\matlab中README文件才是帮助文件。

但是输入help svmtrain会出现帮助信息，其实出现的是系统自带的svmtrain函数，没有libsvm工具箱中的好用。

4.在新版本libsvm3.12中，文件夹libsvm-3.12\windows中已经有编译好的程序，可以直接使用，只需要把libsvm-3.12\windows添加到matlab路径中即可，不需要编译的过程。当然最好还是自己编译一遍，因为编译环境不同会导致一些不可预估的小问题，自己编译的过程是可控的。

5. 测试用数据集，libsvm官网上提供了很多数据集

测试使用的heart_scale数据集是C++版本的（类标签 1:第一个属性 2：第二个属性…），可以用libsvmread来转换为matlab版本的（它们的区别在类标签）。

[label_vector, instance_matrix] = libsvmread(‘C++版本数据集’)%得到类标签和属性矩阵，然后可以使用它们训练了model = svmtrain(label_vector, instance_matrix)

>>load heart_scale

>>model = svmtrain(heart_scale_label,heart_scale_inst)

optimization finished, #iter = 162

nu = 0.431029

obj = -100.877288, rho = 0.424462

nSV = 132, nBSV = 107

Total nSV = 132

>>[predict_label,accuracy] = svmpredict(heart_scale_label,heart_scale_inst,model)

Accuracy = 86.6667% (234/270) (classification)

6.参考资料

libsvm库下载：http://www.csie.ntu.edu.tw/~cjlin/libsvm/

视频：http://v.youku.com/v_showMini/id_XMjc2NTY3MzYw_ft_131.html（有小问题，等下会提到）

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/yw/8149235.html

如何将数据转换libsvm格式文件

发表评论

评论列表（0条）