Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理_随笔

Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理

Kafka官网的QuickStart ------ 这里面有视频

Hi, I’m Tim Berglund with Confluent. I’ d like to tell you what Apache Kafka is. But first, I wanna start with some background. For a long time now, we have written programs that store information in databases. Now, what databases encouraged us to do, is to think of the world in terms of things, things like I don’t know, users and maybe a thermostat, that’s a thermometer, but you get the idea. Maybe a physical thing like a train, let’s say a train, things, there are things in the world database encourages thinking those terms, and those things have some state, we take that state, we store it in the database. This has worked well for decades. But now some people are finding, that is better rather than thinking of things first to think of events first. Now events have some state too, right? An event has a description of what happened with it. But the primary idea is that, the event is an indication in time that the things took place. Now it’s a little bit cumbersome to store events in databases, instead, we use a structure called a log. And a log is just an ordered sequence of these events, an event happens and we write it into a log, a little bit of state, a little bit of description what happens. And that says, hey, that event happened at that time. As you can see, logs are really easy to think about, they’re also easy to build at scale, which historically has not quite been true of databases, which have been a little cumbersome in one way or another to build at scale. Apache Kafka is the system for managing there logs, using a fairly standard historical term, it calls them topics, this is a topic. A topic is just an ordered collection of events that are stored in a durable way, durable meaning that they’re written to disk, and they’re replicated, so they’re stored on more than one disk, on more than one server, somewhere wherever that infrastructure runs, so that there’s no one hardware failure that can make that data go away. Topics can store data for a short period of time, like a few hours, or days, or years, or hundreds of years or indefinitely. Topics can also be relatively small, or they can be enormous. There’s nothing about the economics of Kafka that says that topics have to be large in order for it to make sense, and there’s nothing about the architecture of Kafka that says that they have to stay small, so they can be small, they can be big, they can remember data forever, they can remember data just for a little while. But there are persistent record of events. Each one of those events represents a things happening in the business like remember a user, maybe a user updates her shipping address, or a train, unloads cargo, or a thermostat reports that the temperature has gone from comfy to is it getting hot in here. Each one of those things can be an event stored in a topic, and Kafka encourages you to think of events first, and things second. Now back when databases ruled the world, it was kind of a trend to build one large program, we’ll just build this gigantic program here that uses one big database all by itself. And it was customary for a number of reasons to do this, but these things grew to a point where they were difficult to change, and also difficult to think about. They got too big for any one developer to fit that whole program in his or her head at the same time. And if you’ve lived like this, you know that that’s true. Now the trend is to write lots and lots of small programs, each one of which is small enough to fit in your head and think about and version and change and evolve all on its own. And these things can talk to each other through Kafka topics. So each one of these services can consume the message from a Kafka topic, do whatever its computation is, that goes on in there, and then produce that message off to another Kafka topic that lives over here. So that output is now durably, and maybe even permanently recorded for other services and other concerns in the system to process. So with all this data living in these persistent real time streams, and I’ve drawn two of them now, but imagine there are dozens or hundreds more in a large system. Now it’s possible to build new services that perform real time analysis of that data. So I can stand up some other service over here, that does some kind of gauge, some sort of real time analytics dashboard. And that is just consuming messages from this topic here, that’s in contrast to the way it used to be where you ran a batch process overnight. Now, it’s possible that yesterday is a long time ago for some businesses now. You might want that insight to be instant or as close to instant as it could possibly be. And with data in these topics as events that get processed as soon as they happen. It’s now fairly straightforward to build these services that can do that analysis in real time. So you’ve got events, you’ve got topics, you’ve got all these little services talking to each other through topics, you got real time analytics. I think if you have those four things in your head, you’ve got a decent idea of kind of the minimum viable understanding not only of what Kafka is, which is this distributed log thing, but also of the kinds of software architectures that Kafka tends to give rise to. When people start building systems on it, this is what happens. once a company starts using Kafka, it tends to have this viral effect, right? We’ve got these persistent distributed logs that are records of the things that have happened, we’ve got things talking through them, but there are other systems. I mean, what’s this, there’s this database, there’s probably gonna be, you know, another database out there that was built, before Kafka came along, and you wanna integrate these systems. There could be other systems entirely, maybe there’s a search cluster, maybe you use some SAS product to help your sales people organize their efforts, all these systems in the business, and their data isn’t in Kafka. Well, Kafka Connect is a tool that helps get that data in, and back out, when there’s all these other systems in the world, you wanna collect data, so changes happen in a database, and you wanna collect that data and get it written into a topic like that. And now, I can stand up some new service that consumes that data, and does whatever computation is on it, now that it’s in a Kafka topic, that’s the whole point, Connect gets that data in, then that service produces some result, which goes to a new topic over here. And connect is the piece that moves it to whatever that external legacy system is here. So Kafka Connect is this process that does this inputting and this outputting, and it’s also an ecosystem of connectors. There are dozens, even hundreds of connectors out there in the world, some of them are open source, some of them are commercial, some of them are in between, but they’re these little pluggable modules that you can deploy, to get this integration done in a declarative way, you deploy them, you configure them, you don’t write code, to do this reading from the database, this writing to whatever that external system is. Those modules already exist, the code’s already written, you just deploy them and Connect does that integration to those external systems. And let’s think about the work that these things do, these services, these boxes I’m drawing, they have some life of their own, they’re programs, right? But they’re gonna process messages from topics, and they’re gonna have some computation that they wanna do over those messages. And it’s amazing, there’s really just a few things that people end up doing, like, say, you have messages, these green messages, you wanna group all those up and add some field, like come up with total weight of all the train cars that past a certain point or something, but only a certain kind of car, only the green kinds of cars. And then you’ve got these other, say, you’ve got these orange ones here. So right away we see that, we’re gonna have to go through those messages, we’re gonna have to group by some key, and then we’ll take the group and run some aggregation over it, or maybe count them or something like that. Maybe you want to filter, maybe I’ve got this topic, and let’s see, make some room for some other topic over here that’s got some other kind of data. And I wanna take all the messages here, and somehow link them with messages in this topic, and enrich when I see this, this message happened here, I wanna go enrich it with the data that’s in this other topic. These are common things, if the first time you thought about it, that might seem unusual, but those things grouping, aggregating, filtering, enrichment. Enrichment, by the way, it goes by another name and database, that’s a joint, right? These are the things that these services are going to do. They’re simple in principle to think about and to sketch, but to actually write the code to make all that happen, takes some work and that’s not work you wanna do. So Kafka, again, in the box, just like it has Connect for doing data integration, it has an API called Kafka streams. That’s a Java API that handles all of the framework and infrastructure and kind of kind of undifferentiated stuff you’d have to build to get that work done. So you can use that as a Java API, in your services, and get all that done in a scalable and fault tolerant way, just like we expect for modern applications to be able to do and that’s not framework code you have to write, you just get to use it because you’re using Kafka. Now if you’re a developer and you wanna learn more, you know, the thing to do, is to start writing code. Check those things out, let us know if you have any questions and I hope we hear from you soon.

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5656236.html

Kafka新手入门-000 QuickStart 里面介绍视频的字幕整理

发表评论

评论列表（0条）