目录
一、opensmlie :
二、SMILEapi调用方式。
1、创建实例
2、初始化。
3、设置回调
4、写音频数据。
5、配置文件改动
三、详细代码:
最近尝试使用opensmile进行声音特征提取,查了一些资料和文档,记录在此。
一、opensmlie :opensmile: 官网
github项目:https://github.com/audeering/opensmile
文档: openSMILE — openSMILE documentation
下载后编译,windows下使用cmake生成项目然后编译。编译后安装
include文件夹内为头文件。 lib文件夹内SMILEapi.lib为静态链接库,bin目录SMILEapi.dll为动态动态库。
二、SMILEapi调用方式。打开头文件SMILEapi.h,里面的接口并不算多。因为opensmile提取特征的细节都是在配置文件内指定。
调用步骤
1、创建实例smileobj_t* m_pSmileObj = smile_new();2、初始化。
初始化主要就是加载配置文件,
std::string configfile = "./config/MFCC12_E_D_A_.conf"; smileres_t ret = smile_initialize(m_pSmileObj, configfile.c_str(), 0, nullptr); if (ret == SMILE_SUCCESS) { std::cout << "smile init succeed" << std::endl; } else { std::cout << "smile_init failed" << ret << std::endl; }
初始化中configfile只向的配置文件需要进行一些修改,才能理顺调用过程。
3、设置回调bool external_sink_callback(const float* data, long vectorSize, void* param) { std::cout << "vectorSize: "<回到中两个参数比较重要,一个是回调函数,一个是回调组件名称,上述中的"externalSink"。
设置回到后,配置文件生成的工作流程走到"externalSink"时就会将数据传送给回调函数。
4、写音频数据。int ret = smile_extaudiosource_write_data(m_pSmileObj, "externalSource", SrcData, length);这里同样重要的是 写入组件名称,上述中的"externalSource",需要在配置文件中创建此组件。
5、配置文件改动配置文件在config/mfcc/MFCC12_E_D_A.conf基础上修改而成。
a) 在组件实例管理部分添加 外部源组件 cExternalAudioSource
[componentInstances:cComponentManager] instance[externalSource].type=cExternalAudioSourceb) 在组件配置部分添加 cExternalAudioSource 配置
/ / component configuration / ; the following sections configure the components listed above ; a help on configuration parameters can be obtained with ; SMILExtract -H ; or ; SMILExtract -H configTypeName (= componentTypeName) / [externalSource:cExternalAudioSource] writer.dmLevel=wave blocksize = 8000 blocksize_sec =0.50 sampleRate = 8000 channels = 1 nBits = 16 nBPS = 0 fieldName = pcmcExternalAudioSource 配置有哪些成员可以在 官方文档 中查看,可以配置音频波特率、通道、位数等参数。
c) 数据输出组件实例管理部分添加 instance[externalSink].type = cExternalSink
/// data output configuration // // [componentInstances:cComponentManager] instance[audspec_lldconcat].type=cVectorConcat instance[externalSink].type = cExternalSinkd) 添加组件[externalSink:cExternalSink] 。 组件reader.dmLevel的内容 lld数据将进入回调函数。
[externalSink:cExternalSink] reader.dmLevel = lld改动后整体配置文件./config/MFCC12_E_D_A_.conf如下:
/// / > openSMILE configuration file to extract MFCC features < // / HTK target kind: MFCC_E_D_A, numCeps=12 // / // / * written 2009 by Florian Eyben * // / // / (c) audEERING UG (haftungsbeschr�nkt), // / All rights reserved. // /// /// ; ; This section is always required in openSMILE configuration files ; it configures the componentManager and gives a list of all components which are to be loaded ; The order in which the components are listed should match ; the order of the data flow for most efficient processing ; /// [componentInstances:cComponentManager] instance[dataMemory].type=cDataMemory [componentInstances:cComponentManager] instance[externalSource].type=cExternalAudioSource ; audio framer instance[frame].type=cframer ; speech pre-emphasis (on a per frame basis as HTK does it) instance[pe].type=cVectorPreemphasis ; apply a window function to pre-emphasised frames instance[win].type=cWindower ; transform to the frequency domain using FFT instance[fft].type=cTransformFFT ; compute magnitude of the complex fft from the previous component instance[fftmag].type=cFFTmagphase ; compute Mel-bands from magnitude spectrum instance[melspec].type=cMelspec ; compute MFCC from Mel-band spectrum instance[mfcc].type=cMfcc ; compute log-energy from raw signal frames ; (not windowed, not pre-emphasised: that's the way HTK does it) instance[energy].type=cEnergy ; concat mfcc and energy, so we can compute delta and acceleration ; coefficients of both features at the same tim instance[cat].type=cVectorConcat ; compute delta coefficients from mfcc and energy instance[delta].type=cDeltaRegression ; compute acceleration coefficients from delta coefficients of mfcc and energy instance[accel].type=cDeltaRegression ; run single threaded (nThreads=1) ; NOTE: a single thread is more efficient for processing small files, since multi-threaded processing involves more ; overhead during startup, which will make the system slower in the end nThreads=1 ; do not show any internal dataMemory level settings ; (if you want to see them set the value to 1, 2, 3, or 4, depending on the amount of detail you wish) printLevelStats=3 / / component configuration / ; the following sections configure the components listed above ; a help on configuration parameters can be obtained with ; SMILExtract -H ; or ; SMILExtract -H configTypeName (= componentTypeName) / [externalSource:cExternalAudioSource] writer.dmLevel=wave blocksize = 8000 blocksize_sec =0.50 sampleRate = 8000 channels = 1 nBits = 16 nBPS = 0 fieldName = pcm [frame:cframer] reader.dmLevel=wave writer.dmLevel=frames noPostEOIprocessing = 1 copyInputName = 1 frameSize = 0.04 frameStep = 0.02 frameMode = fixed frameCenterSpecial = left [pe:cVectorPreemphasis] reader.dmLevel=frames writer.dmLevel=framespe k = 0.97 de = 0 [win:cWindower] reader.dmLevel=framespe writer.dmLevel=winframes copyInputName = 1 processArrayFields = 1 ; hamming window winFunc = ham ; no gain, no offset gain = 1.0 offset = 0 [fft:cTransformFFT] reader.dmLevel=winframes writer.dmLevel=fft copyInputName = 1 processArrayFields = 1 inverse = 0 ; for compatibility with 2.2.0 and older versions zeroPadSymmetric = 0 [fftmag:cFFTmagphase] reader.dmLevel=fft writer.dmLevel=fftmag copyInputName = 1 processArrayFields = 1 inverse = 0 magnitude = 1 phase = 0 [melspec:cMelspec] reader.dmLevel=fftmag writer.dmLevel=melspec copyInputName = 1 processArrayFields = 1 ; htk compatible sample value scaling htkcompatible = 1 nBands = 26 ; use power spectrum instead of magnitude spectrum usePower = 1 lofreq = 0 hifreq = 8000 specScale = mel inverse = 0 [mfcc:cMfcc] reader.dmLevel=melspec writer.dmLevel=mfcc copyInputName = 1 processArrayFields = 1 firstMfcc = 1 lastMfcc = 12 cepLifter = 22.0 htkcompatible = 1 [energy:cEnergy] reader.dmLevel=frames writer.dmLevel=energy nameAppend = energy copyInputName = 1 processArrayFields = 0 htkcompatible=1 rms = 0 log = 1 [cat:cVectorConcat] reader.dmLevel=mfcc;energy writer.dmLevel=ft0 copyInputName = 1 processArrayFields = 0 [delta:cDeltaRegression] reader.dmLevel=ft0 writer.dmLevel=ft0de nameAppend = de copyInputName = 1 noPostEOIprocessing = 0 deltawin=2 blocksize=1 [accel:cDeltaRegression] reader.dmLevel=ft0de writer.dmLevel=ft0dede nameAppend = de copyInputName = 1 noPostEOIprocessing = 0 deltawin=2 blocksize=1 // /// data output configuration // // [componentInstances:cComponentManager] instance[audspec_lldconcat].type=cVectorConcat instance[externalSink].type = cExternalSink [audspec_lldconcat:cVectorConcat] reader.dmLevel = ft0;ft0de;ft0dede writer.dmLevel = lld includeSingleElementFields = 1 [externalSink:cExternalSink] reader.dmLevel = lld三、详细代码:代码代码及测试程序已经封装编写为VS2015工程, 代码位置:
opensmileTest.rar-机器学习文档类资源-CSDN下载
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)