ElasticSearc 学习4_随笔

ElasticSearc 学习4 数据写入

集群

客户端选择了一个node发送数据过去，这个node就是coordinating node 协调节点；
Coordinating node 对document进行路由，将请求发送给对应的node 有primary shard

路由算法： shard_index=hash(id)%number_of_primary_shards

实际的node上的primary shard处理请求，然后将数据同步到replica node
coordinating node，如果发现primary node和所有replica node都搞定之后，就返回响应结果给客户端

节点

先写入buffer，在buffer里的时候数据是搜索不到的；同时将数据写入translog日志文件
如果buffer快满了，或者到一定时间，就会将buffer数据refresh到一个新的segment file中
es是准实时的 NRT，near real-time
只要数据进入os cache，此时就可以让这个segment file的数据对外提供搜索了

translog达到一定长度的时候，就会触发commit *** 作（默认每隔30分钟会自动执行一次commit）
commit *** 作：1、写commit point；2、将os cache数据fsync强刷到磁盘上去；3、清空translog日志文件
整个commit的过程，叫做flush *** 作。我们可以手动执行flush *** 作

segment file会越来越多，此时会定期执行merge

引用算法
PacificA算法
PacificA是微软亚洲研究院提出的一种用于日志复制系统的分布式一致性算法，论文发表于2008年(PacificA paper)。ES官方明确提出了其Replication模型基于该算法：
https://github.com/elastic/elasticsearch/blob/master/docs/reference/docs/data-replication.asciidoc

Elasticsearch’s data replication model is based on the primary-backup model and is described very well in the PacificA paper of Microsoft Research. That model is based on having a single copy from the replication group that acts as the primary shard. The other copies are called replica shards. The primary serves as the main entry point for all indexing operations. It is in charge of validating them and making sure they are correct. once an index operation has been accepted by the primary, the primary is also responsible for replicating the operation to the other copies.

算法特点：

强一致性。
单Primary向多Secondary的数据同步模式。
使用额外的一致性组件维护Configuration。
少数派Replica可用时仍可写入

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5699824.html

ElasticSearc 学习4

发表评论

评论列表（0条）