Elasticsearch在Basic授权中支持以SQL语句的形式检索文档,SQL语句在执行时会被翻译为DSL执行。从语法的角度来看,Elastisearch中的SQL语句与RDBMS中的SQL语句基本一致,所以对于有数据库编程基础的人来说大大降低了使用 Elaticsearch的学习成本。
Elasticsearch提供了多种执行SQL语句的方法,可使用类似_search样的REST接口执行也可以通过命令行执行。它甚至还提供了JDBC和ODBC驱动来执行SQL语句,但JDBC和ODBC属于Platinum(白金版)授权需要付费,所以这里只介绍_sql接口。
sql接口在早期版本中,Elasticsearch执行SQL的REST接口为_xpack/sql,但在版本7以后这个接口已经被废止而推荐使用_sql接口。
例如:
POST _sql?format=txt { "query": """ select DestCountry, OriginCountry,AvgTicketPrice from kibana_sample_data_flights where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc limit 3 """ } 输出结果: DestCountry | OriginCountry | AvgTicketPrice ---------------+---------------+------------------ US |PR |1199.109130859375 JP |IN |1196.7706298828125 AR |CO |1195.72509765625
在示例中,_sql接口通过query参数接收SQL语句,而SQL语句也包含有select、from、where、order by、limit等子句。_sql接口的URL请求参数format定义了返回结果格式。比如在示例中定义了返回结果格式为txt。除了txt以外,_sqI接口还支持csv、json、tsv、yaml等等格式。示例中的请求会将所有航空公同为Kibana Airines的航班文档检索出来,并以文本表格的形式返回。
对于总量比较大的SOL查询,sql接口还支持以游标的形式实现分页。当_sql接口的请求参数中添加了fetch_size参数,_sql接口在返回结来时就会根据fetch_size参数设置的大小返回相应的条数,并在返回结果中添加游标标识。具体来说,当请求_sql接口时设置的forma为json时,返回结果中会包含cursor属性;而其他情况下则会在响应中添加Cursor报头。
例如还是执行示例中的SOL ,但是加入分页支持:
POST _sql?format=json { "query": """ select DestCountry, OriginCountry,AvgTicketPrice from kibana_sample_data_flights where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc """, "fetch_size": 3 } 输出结果: { "columns" : [ { "name" : "DestCountry", "type" : "keyword" }, { "name" : "OriginCountry", "type" : "keyword" }, { "name" : "AvgTicketPrice", "type" : "float" } ], "rows" : [ [ "US", "PR", 1199.109130859375 ], [ "JP", "IN", 1196.7706298828125 ], [ "AR", "CO", 1195.72509765625 ] ], "cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBUl9xNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc=" }
在示例的请求中,将format设置为json,这样在返回结果中就能直接看到cursor值,当需要请求下一页的数据时,只需传递cursor值即可进行查询:
POST _sql?format=json { "cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc=" } 输出结果: { "rows" : [ [ "IT", "CA", 1195.3363037109375 ], [ "KR", "NL", 1194.945556640625 ], [ "JP", "CO", 1194.386962890625 ] ], "cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc=" }
反复执行请求,Elasticsearch就会将第一次请求的全部内容以每次10个的数量全部迭代出来。在请求完所有数据后,应该使用_sql/close接口将游标关闭以释放资源。
POST _sql/close?format=json { "cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc=" }
除了fetch_size以外还有些可以在_sql接口请求体中使用的参数,如下:
- query:需要执行的SQL语句,必须要设置的参数
- fetch_size:默认1000,每次返回的行数
- filter:默认none,使用DSL设置过滤器
- request_timeout:默认90s,请求超时时间
- page_timeout:默认45s,分页超时时间
- tume_zone:默认Z,时区
- field_multi_value_leniency:默认false,如果一个字段返回多个值时是否忽略
在这些参数中,fiter可以使用DSL对文档做过滤,支持DSL中介绍的所有查询条件。query中的SQL语句在翻译为DSL后,会与filter中的DSL查询语句共同组合到bool查询中。其中SQL语句生成的DSL将出现在must子句,而filter中的DSL则出现在filter子句中。来想要查看SQL语句翻译后的DSL可以使用_sql/translate执行相同的请求,在返回结果中就可以看到翻译后的DSL了。
post _sql/translate { "query": """ select DestCountry, OriginCountry,AvgTicketPrice from kibana_sample_data_flights where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc """, "fetch_size": 3 } 输出结果: { "size" : 3, "query" : { "term" : { "Carrier" : { "value" : "Kibana Airlines", "boost" : 1.0 } } }, "_source" : { "includes" : [ "AvgTicketPrice" ], "excludes" : [ ] }, "docvalue_fields" : [ { "field" : "DestCountry" }, { "field" : "OriginCountry" } ], "sort" : [ { "AvgTicketPrice" : { "order" : "desc", "missing" : "_first", "unmapped_type" : "float" } } ] }SQL语法
Elasticsearch支持传统关系型数据库SQL语句中的查询语句,但并不支持DML、DCL句。换句话说,它只支持SELECt语句,不支持INSERT、UPDATE、DELETE语句。SELECT语句以外,Elaticsarch还支持DESCRIBE和SHOW语句。
SELECT语句SELECT语句用于查询文档,基本语法格式如下:
SELECT select_expr, [ FROM table_name」 [WHERe condtion] [ GROUP BY grouping_element] [ HAVINg condition J [ ORDER BY expression [ASC|DESC] [ LIMIT[ count]]
通过示例可以看出,Elastiesearch的SELECt语句跟普通SQL几乎没有什么区别,支持SELECT、FROM、WHERe、 GROUP BY、HAVINg、ORDER BY及LIMTT子句。
SELECt子句中可以使用星号或文档字段名称列表,FROM子句则指定要检索的索引名称,而WHERe子句则设定了检索的条件。一般的SQL查询使用这三个子句就足够了,而GROUP BY和HAVINg子句则用于分组,ORDER BY子句用于排
序,而LIMIT一般则可以用于分页。和传统SQL语句非常接近。
DESCRIBE语句用于查看一个索引的基础信息,在返回结果中一般会包含column、type、mapping三个列,分别对应文档的字段名称、传统数据库类型及文档字段中的类型。
例如要查看索引的基本信息:
POST _sql?format=txt { "query": "describe kibana_sample_data_flights" } 输出结果: column | type | mapping ------------------+---------------+--------------- AvgTicketPrice |REAL |float Cancelled |BOOLEAN |boolean Carrier |VARCHAR |keyword Dest |VARCHAR |keyword DestAirportID |VARCHAR |keyword DestCityName |VARCHAR |keyword DestCountry |VARCHAR |keyword DestLocation |GEOMETRY |geo_point DestRegion |VARCHAR |keyword DestWeather |VARCHAR |keyword DistanceKilometers|REAL |float DistanceMiles |REAL |float FlightDelay |BOOLEAN |boolean FlightDelayMin |INTEGER |integer FlightDelayType |VARCHAR |keyword FlightNum |VARCHAR |keyword FlightTimeHour |VARCHAR |keyword FlightTimeMin |REAL |float Origin |VARCHAR |keyword OriginAirportID |VARCHAR |keyword OriginCityName |VARCHAR |keyword OriginCountry |VARCHAR |keyword OriginLocation |GEOMETRY |geo_point OriginRegion |VARCHAR |keyword OriginWeather |VARCHAR |keyword dayOfWeek |INTEGER |integer timestamp |TIMESTAMP |datetimeSHOW语句
SHOW语句包括三种形式,即SHOW COLUMNS、SHOW FUNCTIONS和SHOW TABLES。
SHOW COLUMNS用于查看一个索引中的字段情况,它的作用与DESCRIBE语句完全一样,甚至连返回结果都是一样的。
POST _sql?format=txt { "query": "show columns in kibana_sample_data_flights" } 输出结果: column | type | mapping ------------------+---------------+--------------- AvgTicketPrice |REAL |float Cancelled |BOOLEAN |boolean Carrier |VARCHAR |keyword Dest |VARCHAR |keyword DestAirportID |VARCHAR |keyword DestCityName |VARCHAR |keyword DestCountry |VARCHAR |keyword DestLocation |GEOMETRY |geo_point DestRegion |VARCHAR |keyword DestWeather |VARCHAR |keyword DistanceKilometers|REAL |float DistanceMiles |REAL |float FlightDelay |BOOLEAN |boolean FlightDelayMin |INTEGER |integer FlightDelayType |VARCHAR |keyword FlightNum |VARCHAR |keyword FlightTimeHour |VARCHAR |keyword FlightTimeMin |REAL |float Origin |VARCHAR |keyword OriginAirportID |VARCHAR |keyword OriginCityName |VARCHAR |keyword OriginCountry |VARCHAR |keyword OriginLocation |GEOMETRY |geo_point OriginRegion |VARCHAR |keyword OriginWeather |VARCHAR |keyword dayOfWeek |INTEGER |integer timestamp |TIMESTAMP |datetime
SHOW FUNCTIONS用于返回在Elastiesearch SQL中支持的所有函数,返回结果中包括MIN、MAX、COUNT等常用的聚集函数。
POST _sql?format=txt { "query": "show functions" } 输出结果: name | type -----------------+--------------- AVG |AGGREGATE COUNT |AGGREGATE FIRST |AGGREGATE FIRST_VALUE |AGGREGATE LAST |AGGREGATE LAST_VALUE |AGGREGATE MAX |AGGREGATE MIN |AGGREGATE SUM |AGGREGATE KURTOSIS |AGGREGATE MAD |AGGREGATE PERCENTILE |AGGREGATE PERCENTILE_RANK |AGGREGATE SKEWNESS |AGGREGATE STDDEV_POP |AGGREGATE SUM_OF_SQUARES |AGGREGATE VAR_POP |AGGREGATE HISTOGRAM |GROUPING CASE |ConDITIONAL COALESCE |ConDITIONAL GREATEST |ConDITIONAL IFNULL |ConDITIONAL IIF |ConDITIONAL ISNULL |ConDITIONAL LEAST |ConDITIONAL NULLIF |ConDITIONAL NVL |ConDITIONAL CURDATE |SCALAR CURRENT_DATE |SCALAR CURRENT_TIME |SCALAR CURRENT_TIMESTAMP|SCALAR CURTIME |SCALAR DAY |SCALAR DAYNAME |SCALAR DAYOFMonTH |SCALAR DAYOFWEEK |SCALAR DAYOFYEAR |SCALAR DAY_NAME |SCALAR DAY_OF_MonTH |SCALAR DAY_OF_WEEK |SCALAR DAY_OF_YEAR |SCALAR DOM |SCALAR DOW |SCALAR DOY |SCALAR HOUR |SCALAR HOUR_OF_DAY |SCALAR IDOW |SCALAR ISODAYOFWEEK |SCALAR ISODOW |SCALAR ISOWEEK |SCALAR ISOWEEKOFYEAR |SCALAR ISO_DAY_OF_WEEK |SCALAR ISO_WEEK_OF_YEAR |SCALAR IW |SCALAR IWOY |SCALAR MINUTE |SCALAR MINUTE_OF_DAY |SCALAR MINUTE_OF_HOUR |SCALAR MonTH |SCALAR MonTHNAME |SCALAR MONTH_NAME |SCALAR MONTH_OF_YEAR |SCALAR NOW |SCALAR QUARTER |SCALAR SECOND |SCALAR SECOND_OF_MINUTE |SCALAR TODAY |SCALAR WEEK |SCALAR WEEK_OF_YEAR |SCALAR YEAR |SCALAR ABS |SCALAR ACOS |SCALAR ASIN |SCALAR ATAN |SCALAR ATAN2 |SCALAR CBRT |SCALAR CEIL |SCALAR CEILING |SCALAR COS |SCALAR COSH |SCALAR COT |SCALAR DEGREES |SCALAR E |SCALAR EXP |SCALAR EXPM1 |SCALAR FLOOR |SCALAR LOG |SCALAR LOG10 |SCALAR MOD |SCALAR PI |SCALAR POWER |SCALAR RADIANS |SCALAR RAND |SCALAR RANDOM |SCALAR ROUND |SCALAR SIGN |SCALAR SIGNUM |SCALAR SIN |SCALAR SINH |SCALAR SQRT |SCALAR TAN |SCALAR TRUNCATE |SCALAR ASCII |SCALAR BIT_LENGTH |SCALAR CHAR |SCALAR CHARACTER_LENGTH |SCALAR CHAR_LENGTH |SCALAR ConCAT |SCALAR INSERT |SCALAR LCASE |SCALAR LEFT |SCALAR LENGTH |SCALAR LOCATE |SCALAR LTRIM |SCALAR OCTET_LENGTH |SCALAR POSITION |SCALAR REPEAT |SCALAR REPLACe |SCALAR RIGHT |SCALAR RTRIM |SCALAR SPACE |SCALAR SUBSTRING |SCALAR UCASE |SCALAR CAST |SCALAR ConVERT |SCALAR DATAbase |SCALAR USER |SCALAR ST_ASTEXT |SCALAR ST_ASWKT |SCALAR ST_DISTANCE |SCALAR ST_GEOMETRYTYPE |SCALAR ST_GEOMFROMTEXT |SCALAR ST_WKTTOSQL |SCALAR ST_X |SCALAR ST_Y |SCALAR ST_Z |SCALAR SCORE |SCORE
最后,SHOW TABLES用看Elaticecearch中所有的索引。
POST _sql?format=txt { "query": "show tables" } 输出结果: name | type | kind --------------------------+---------------+--------------- .apm-agent-configuration |base TABLE |INDEX .kibana |VIEW |ALIAS .kibana_1 |base TABLE |INDEX .kibana_task_manager |VIEW |ALIAS .kibana_task_manager_1 |base TABLE |INDEX articles |base TABLE |INDEX colleges |base TABLE |INDEX employees |base TABLE |INDEX employees1 |base TABLE |INDEX kibana_sample_data_flights|base TABLE |INDEX kibana_sample_data_logs |base TABLE |INDEX
这三种形式都支持使用LIKE子句过滤返回结果,LIKE子句在用法上与SQL语句中的LIKE类似。
例如,show tables like 'a%'将只返回以a开头的索引。
POST _sql?format=txt { "query": "show tables like 'a%'" } 输出结果: name | type | kind ---------------+---------------+--------------- articles |base TABLE |INDEX*** 作符与函数
Elasticsearch SQL中支持的 *** 作符与函数有100多种,这些 *** 作符大多与普通SQL语言一致,所以这里只介绍一些与普通 SQL语句不一样的地方。
先来看一下比较 *** 作符。一般等于比较在SQL中使用等号“=”,这在ElasticsearchSQL中也成立。但是Elasticseareh SQL 还引人了另一个等号比较“< = >”,这种等号可以在左值为null时不出现异常。
LIKE *** 作符,在LIKE子句中可以使用%代表任意多个字符,而使用下划线_代表单个字符。Elasticsearch SQL不仅支持 LIKE子句,还支持通过RLIKE子句以正则表达式的形式做匹配,这大大扩展了SQL语句模糊匹配的能力。
尽管使用LIKE和RLIKE可以实现模糊匹配,但它离全文检索还差得很远。SQL语句的WHERe子句一般都是使用字段整体值做比较,而没有使用词项做匹配的能力。为此Elasticsearch SQL提供了MATCH和QUERY两个函数,以实现在SQL做全文检索。
例如下面的两个请求分别使用match和query函数,它们的作用都是检索DestCounty字段为CN的文档:
POST _sql?format=txt { "query": """ select DestCountry, OriginCountry,AvgTicketPrice,score() from kibana_sample_data_flights where match(DestCountry,'CN') limit 3 """ } 输出结果: DestCountry | OriginCountry | AvgTicketPrice | SCORE() ---------------+---------------+-----------------+--------------- CN |MX |730.041748046875 |2.4774308 CN |MX |922.4990844726562|2.4774308 CN |RU |277.4297180175781|2.4774308
POST _sql?format=txt { "query": """ select DestCountry, OriginCountry,AvgTicketPrice,score() from kibana_sample_data_flights where query('DestCountry:CN') limit 3 """ } 输出结果: DestCountry | OriginCountry | AvgTicketPrice | SCORE() ---------------+---------------+-----------------+--------------- CN |MX |730.041748046875 |2.4774308 CN |MX |922.4990844726562|2.4774308 CN |RU |277.4297180175781|2.4774308
在示例中的两个请求的selet子句中都使用了SCORE函数,它的作用是获取检索的相关度评分值。
Elasticsearch SQL支持传统SQL中的聚集函数,这包括MAX、MIN、AVG、COUNT、SUM等。同时,它还支持一些 Elasticsearch特有的聚集函数,这些聚集函数与Elasticsearch聚集查询相对应。这包括FIRST/FIRST_ VALUE和 LAST/LAST_VALUE,可用于查看某个字段首个和最后一个非空值;PERCENTILE和PERCENTILE RANK用于百分位聚集,KURTOSIS、SKEWNESS、STDDEV_ POP、SUM_OF SQUARES和VAR_ POP可用于运算其他统计聚集。除了以上这些函数和 *** 作符,Elasticsearch SQL还定义了一组用于日期、数值以及字符串运算的函数。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)