Jason Gustafson, Confluent: Revisiting Exactly One Semantics (EOS) | Bay Area Apache Kafka® Meetup

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay so our second talk is about exactly once so if you know about electric ones it is actually a feature that has been developed and contributing to Apache cough castings o11 which is like two years ago you know two years in you know open-source software development is actually a pretty long time because imagine right now we have Kafka 2.3 release already and lucky ones is introduced in all eleven right but it's never like a done deal from engineering point of view since its introduction more and more people especially the users of Kafka streams and kapha Connect is actually enabling exactly once and the community has actually be trying to investigate into different issues after we have first introduced the version and we want to actually improve especially on the performance on the scalability of enabling exactly once so in the second talk we will have Jason and Boyan talking about what are the lesson learned from developing the effluence features and what are the ongoing work so that we actually are working on with the community and that we hope to actually get those improvements into the next release of Apache Kafka so with no further delay we have Jason and Boyer thank you thanks John okay so it's just overview of our talk so basically what we're going to do is we're going to talk about what is what actually is exactly once what does it mean how does it work under the covers and then we're gonna go through some of the problems that we've had and how we're gonna go how we're planning to fix them so I'm gonna cover the first half of it kind of identify explain how it works and all this and then boyoung is going to jump in and explain some of the work eat since he's actually doing it so first off I just want to give kind of some background about exactly one semantics what problem is it solving really and what what guarantees is it actually designed to give the scenario is really that you've got this is you you want to be thinking about like Kafka streams for example where you have some kind of input topic and an output topic you're reading from the input you're doing some kind of processing and then you're writing the output so the guarantee that exactly one semantics is trying to give you is that you know each each input that you have is reflected in the output exactly once so for example we read this record a we do some processing and out pops out a prime and and and so on we continue in this way we don't get any duplicates so with the kafka will handle the duplication handle any kind of failures that come up and then we'll write the data exactly once so I gave an example basically where is like just one to one input to output it doesn't have to be that way so for example if we have the log containing some kind of you know counts like in this case just you know each each record just has one and the purpose of our processor is to just sum up all those values it doesn't have to be a one-to-one mapping from the input to the output so for example we might read the first two values sum those up right out the summation and then similarly continue in this process so there's there's no actual requirement about one one to one it's really more about guaranteeing that each input is reflected exactly once in the output okay and similarly in in the initial example and most examples here we have just one input topic and one output topic that also doesn't have to be the case so you can have multiple output topics multiple output partitions and we're just looking for a similar guarantee so a and B are processed and the AB values are reflected in the outputs and and so on our gene of just for simplicity in all these examples I'm just going to show an input and output topic because that's enough to show most of the interesting cases that we want we want to talk about and if we want to start being a little bit more precise about what exactly exactly once is doing so even with a single input output partition there's actually this other output partition which is implicit in the way that the consumer works so when the consumer is reading it's it's data it's is always keeping track of its position in the input partitions so the input this position basically is written to another topic so the guarantee that exactly once gives you is really about ensuring that the atomicity of those two writes at the same time so when the data is reflected in the output so is the position so basically even even like with a single input and output partition there's another output partition that we're concerned about which is the position so if we want to like start and start looking at this process and a little bit more detail what it really consists of is not read process right but you know first we read the position from this position topic we do the reading process it write it and then we're writing the output you know both the outputs and position and so on okay so and what we're looking for is basically that all of this is atomic so it either all succeeds the full pipeline succeeds or none of it does okay so the way this works in Kafka is it's we basically have to have two parts to guarantee this atomicity so number one is we have a transactional write across multiple partitions so that solves the parts about how do we how do we ensure that's the position and the output are written to the log in in a you know a transactional way and then the second part is a more subtle and which takes a little bit more explanation is basically a protocol to guarantee a single writer and I'll explain it a little bit more detail what that means so first first let me just briefly talk about the transaction now surprisingly this is actually the the part that's easiest to explain so because it's it's a fairly standard two-phase commit protocol so what we're focused on in this protocol is really the the last two parts there the writing of the offset and the writing of the data so in Kafka we have this protocol where you know we start with a begin that I think I got some examples to go through this in a little bit more detail so let me describe what's on the screen so we've got the input partition as we have before and then we've got that to Alfred partitions the position we're restoring our consumer offsets and then we're restoring the data and addition to that we have a transaction log the transaction log is what we're using to influence on of the two-phase commit so the basic way this works is we will read some input we'll add that you know before before we do any rights to any of the partitions we'll write a message to the transaction log saying we're going to write basically we're going to start this transaction and it's going to include this output partition oh okay then then we can go ahead and do the right now I've grade this out here because at this point the data is in the log but it hasn't been committed yet it's fate is still uncertain at this point so it's not visible to consumers and basically an another step has to be taken before is becomes visible so the second thing is you know when we so we wrote to the output partition we also need to write to the consumer offsets to say okay we read from this position and now we're going to commit that position so the position is you know the consumer offsets is just another topic it's just a partition so it's not really distinguished from the transactional perspective so what we do is we write a message to the transaction log saying okay now I'm going to write something to the consumer offsets so once once we have that record committed then we can go ahead and commit our position and then once we've finished all of this then finally we go ahead and we commit the transaction so this is the two-phase parts so it can consist first of a message to the transaction log saying okay we're preparing to commit and basically this means that's at this point like you can't go back this is like the point of no return so once you write that record it's doomed to roll forward as well we say but it hasn't happened yet so the data is not visible yet so then the second parts of its are we write actually a marker into the transaction log saying or sorry into the output topics saying okay this this now is committed and at that point that's that's when it becomes visible so we do the same thing then once all the markers have been written to those output topics then we write another message to transaction log saying basically we finished this transaction and we reset our state okay and then just the other other case that we want to explain is yeah so you basically all of this stuff you know can happen but as before we started a transaction we wrote some records to transaction logs saying okay these partitions are included in the transaction now what happens is maybe we could crash so if our our process crashes then we have basically the transaction coordinator which is responsible for detecting that the crash has happened and at this point we abort the transaction so we go through a similar two-phase can process we say we're preparing to abort this is the point of no return and then we abort that data its and then the consumer basically won't will not see that data there was skip over it and similarly we finish it okay so now I want to talk about the more the thing that's a little bit hard more difficult to explain which is this protocol to guarantee a single Rider and we have this notion in aggregate to its so what I've been describing right now this you know we have this input topic we do some processing and we have an output topic so in practice you know you actually have multiple processors which are you know cooperating in order to consume from a number of input partitions so and similarly you have a number of outgo partitions that you're you're trying to write to so what we say about the single rider it doesn't mean that there's only one rider for each output partition it's more about actually the input partition what we're looking for is a guarantee that's for each input partition there is only a single rider which is responsible for reading that data and writing the output so if the real guarantee is really about the input okay so just I've demonstrated it or just showing it up here each color is kind of representing a a different scope a different single rider scope and basically when we have you know in in Kafka of course what we're talking about really is some kind of a consumer group where the partitions that we're consuming are divided among the members of the group and then so whenever you have some kind of failure then that's partition is reassigned to another member so basically like that it's it's single rider scope can follow along with it okay so in order to make this work there's basically two problems that we have to solve so the first problem is we'll call the initialization problem the initialization problem is about how we do this handoff so when we had a failure that let me let me just get to the example so let's say that we you know processor one at this at this moment has been assigned a partition and it's doing you know a transactional processing loop just like we described before so we're reading the inputs we're writing the outputs and then where we're trying to commit the data now the transaction can be in any state that we saw it in those previous slides so it can be that's you know we haven't written we haven't decided basically whether it's going to be aborted or committed it could be that we've already decided like we've written that prepare marker and but we haven't we haven't finished the rest it could also be the case that we actually completed it so all of those things are possible and they all have to be handled so suppose that's we were in the process of committing the transaction so it could be like we wrote the prepare marker to the transaction log but the outputs you know we haven't we haven't committed the data you know the markers haven't been written to the outputs so they're not visible yet so basically the risk is now processor two gets assigned this this info partition and it wants to pick up where processor 1 had left off but in order to do so it has to take into account the fact that we had this ongoing transaction and which could be committing aborting or anything so the danger is basically that if if we don't do that then we read stale data and then are our guarantees become violated okay so that's basically what I call the initialization problem it's basically a way that the processor after it's been assigned a partition it can resume from a consistent state the second problem is the fencing problem so the basic issue that this is trying to address is now processor one it may have like a clean failure it may have actually you know someone went and kiln IED the process it's gone it's never coming back but it could also be the case that's you know it's it had some kind of you know network blip where it's temporarily partitioned from everyone else it could be in the middle of a long GC any of those things are possible the point is that it could come back at some points so if we do any kind of handoff like this becomes a zombie now we want to transfer the partition over to the other processor and we're in the middle of this this committing States it's not enough really to make sure that you know we've we've recovered from a consistent position we also have to make sure that the thing or not didn't didn't show very well but basically we also have to be sure that processor one is prevented from interfering with what we're trying to do so it gets fenced basically as soon as we've taken over it shouldn't be allowed to make progress anymore yeah so that's what I kind of tried tried to show there so that's those two problems kind of bring us to this notion of what we call an Kafka a transactional IDE so this is configured in the producer it's a user-defined configuration and basically its purpose is to define what that single Rider scope is so within this within the context of a transactional ID we guarantee that there can only be a single Rider and the way this is enforced with the way we do the fencing and like distributed systems or you know want to do we add an epoch so the epoch on every kind of transition is bumped so if you try to write with an old epoch then basically your your fence at that point and then we have an initialization protocol which guarantees that when we are starting up like we've just been assigned a partition that we're able to start from a consistent point so this is this is kind of what the the initialization protocol looks like you know we we start off with bump the epoch then then we're going through this process we check these are a transaction in progress you know if it if it is if it's you know if it's in progress and it's in completing like we've already written that prepare or rare commit or prepare more abort message then you know basically we're going to go through this awaiting completion if the transaction was in progress but it hasn't been decided yet then we go ahead and abort it right then and in the end after all this is done we returned the new epoch so the guarantee that we're giving is but by the time this protocol has run the producer has a new epoch and it's guaranteed that any previous epoch to that has completed that if they had a transaction ongoing they won't be a you know they won't it's either been completed or they're gonna be fenced fenced off so they can't do anything else and we're in a clean slate it doesn't guarantee that a new producer won't be started after us but that the protocol will guarantee that producer so the way this works you know in this case I've got three input partitions three transactional IDs when the consumer group when we have some kind of failure basically the transactional ID goes along with the consumer we bump the epoch and now we're protected okay so yeah so summarizing basically this this is what we would what we do in Kafka transactional writes across multiple partitions and then a protocol for guaranteeing single writer and then just just to summarize like basically the way this is working we bump the transactional ID at the beginning of this process and then we have the transactional rights look like in the code I'm just gonna go through this pretty quickly basically we have this API this init transactions API if you've seen this before and wonder what it is well that that initialization protocol that I just described this is what it's for so by the time that init transactions returns we've gone through this protocol we've guaranteed that we've got a bumped epoch and now we're in a consistent position and when it's like safe to read from the offsets that we want to start consuming and then you know so basically this you're controlling from the consumer starting transaction through the producer and interesting thing of course is that since the consumer offsets are part of the transaction we don't commit them through the consumer we commit them through the producer so that they have of the transactional states and finally we commit the transaction okay so so now we're gonna start talking about you know for the most part I think it's been a you know a good approach and it's worked well however there have been some shortcomings that gojong related to so let me cover the first one of those and then you know I have Boyan jump in here and talk about the other one so the first one I think is error resiliency so you know probably if anyone has used EOS you might have encountered this unknown producer ID exception what does that mean exactly let me let me just kind of briefly describe that so if we're looking at the kafka producer and kind of like the architecture of its what are the different components and it's we have this transaction manager which is responsible for like managing all of the transaction state so transaction state for for the producer means a few things so number one it means like we you know we have a state machine for the transaction like we are we started a transaction now we're ready to commit it and so on additionally we have some some bookkeeping for every partition that we write to basically to track where you know like sequence numbers for deduplication so as as we're writing things you know our requests may fail we rewrite using same sequence numbers and the broker is able to deduplicate so now the problem if you if you're looking at it on the broker side now the broker is also doing some bookkeeping so the client has these sequence numbers that it's trying to use to in order to write to topics similarly the broker is maintaining also a similar set of sequence numbers for producers that it's using in order to deduplicate requests so if you look inside the the broker what you see is we have some kind of a cash the cash has you know for each producer ID what's like to seek the epoch and the sequence number the epoch we already talked about for fencing a sequence number is used for the purpose of deduplication so as as data is getting written to the log we're bumping these sequence numbers and so on so of course in Kafka we have you know usually finite retention other than compacted topics and at some point we want to remove the tail of the log now when we do so we may remove some of the state that we had for some of these producers so in this case this producer ID 3 for example all of its if you can see it very well but the intention is for this first four records are they're basically going to get removed so we're removing all of the state that corresponds to that producer ID 3 and what we do in Kafka is we always we maintain this invariant that basically the log is the source of truth for all of this state and we updates the producer state cash always to reflect that so what happens is when when you have like a big deletion from the tail of the log then that producer state is gone and that's what how that's where we get that unknown producer ID coming from so when we were writing this of course thought you know people should be you know producer should be active they should be doing stuff but it turns out that's there's another case that I think this is coming out at about the same time the eos is coming out basically like we have we have a mechanism in Kafka where if you know you've finished reading some data you can delete it so Kafka streams uses this for so-called repartition topics I won't try to explain any partition topics because I don't understand them but basically it's it's used for wreaking the data and but the point is once the consumer has kind of finished reading this data the data that isn't needed anymore so it's already been pushed to these output topics so what Kafka streams to do is actually like delete all of the log so basically in this case when we had this happen you know you're losing all of the producer states so it becomes from just like like a rare case which people should never hit to actually a common case and so basically like what you actually get is something like this and that's what that's really what caused what caused all these unknown pretty stride years so how do we fix this I think this is kind of like two things that we need to do so first one you know as as confident as we're writing to the log you know we're maintaining this state the state has to be like rebuilt from somewhere we said the log is the source of truth so it has to be rebuilt from somewhere like after a failure or something like this so we keep snapshot files to help us rebuild that state so we can go back to like what's the lead latest snapshot file reload that state and then the amounts of that we have to read from the log is small and so basically what you know we're taking these snapshots periodically so the basic idea this is pretty simple when we remove when we remove the data we move the snapshot and like as we do now we're getting rid of that producer ID so yeah maybe we just don't do that so basically rather than getting rid of the state even though we lost it from the log you know there's really no harm in maintaining it in the snapshots so basically even though even though we lost the records for that one we can keep it so I'm gonna I'm gonna head and skip through some of this I think the the main thing you want to say about this is just that this is this is probably like then 95% of the solution but it can still happen because the log is no longer really you know 100% the source of truth anymore that's if you're doing like a partition reassignment for example the new broker that's coming up as a replica it won't have all of the states this this soft state that's not actually present in the log so they're basically still some cases where you can get a unknown producer even though you know even though we have this improvement so the other part of this is is really about yeah in fact I'm just gonna kind of skip through this if you don't mind I think that the main thing is really that we are once we once we reach a state where basically we've got this unknown producer ID currently today we would just fail and say that okay that's it producer you know the producers done like you need to close it down and start up a new one instead what we have in in cap 360 is basically a way to get ourselves out of that state so for example if we were in the middle of a transaction it should be possible it should be safe to abort whatever transaction we were in and then start a new one and so basically then we can start off from a clean slate again so that's that's kind of the approach there so really that the the summary here is that no current impact when you get into any kind of error where you know there's maybe there was an out of sequence error because of you know a bug or something like this or an unknown producer idea which is like just a different class of the same problem we have a way to get ourselves out of that state so this is what I mean about resiliency in the producer so we can always abort our transaction bump our epoch and resume okay so then I'm gonna have a bullion come in and talk about the second parts of us that we've thought about improving and this is a producer scalability thank you thank you Jason yeah let's get to transition part 2 for now so my my work recently is actually on the improving for the capability for the client so as I straighten talk about that usually you actually have like a consumer working out together take some input data put output so this is actually what come streams actually doing so come stream you just like a very lightweight Java library so it leverages consumer and producers to do the network traffic so basically we have the notion of the streams thread every stream thread is just like the processor we just mentioned so they are isolated they don't talk to each other and then so if you zoom in to one of the thread it's actually further split the input data based on the partitions and we have this notion of stream tasks which is just mapping from the impartation to the tasks for example if you have like topic with three partitions and you do a grecian job on top of that you get three tasks but that doesn't mean every partition map to only one path or the other way around so because you can join two topics then that means one task can be mapping to two partitions so that's the thing you won't know but basically the the task definition is kind of like straightly mapping towards our input partition so you can think all the tasks I didn't here are unique and then we we only have one producers and consumer for the normal processing we haven't talked about the exactly ones but when we turn it on then the single read scope for the transition ID semantics how do we guarantee that if we just use this model there's got to be a problem because I think just already talked about this and I would just go through really quick because we have rebalance consumer can shuffle your assignment across the entire group and then when we had a t1 the the old tasks they they are they're having some ongoing transaction possible if someone has a unclean shutdown then when you're trying to start your job you don't know like whether those tasks zero to five are they have any ongoing transactions and finished or you can proceed because the old the way that the a single writer is guaranteeing is that just you need to make sure that you can always pump so in this way if you can then I mean assign the partitions so that no longer works so you always need to make sure that you have some tracking of the transition state across generation to make sure that you can always start from a clean stage so I think I just talked about the problem we're solving is basically I think if you work with consumer before consumer always has this dynamic assignments Mantic while contrary to that producer does not have any notion of the group so you never heard of like a producer group or something so basically this is cementing mismatch between the two and we have to find a way to fix that so that we can improve the stability well just this makes us wondering like how we solve the problem today it's actually pretty straightforward solution we just study the producer for every single task we got a producer so that you can use the task ID as the producer ID because the like I just mentioned task is mapping exactly once to the partition so you can always make sure that we see in the same group that all the task ID will be take position by someone else and then every time you initialize a new producer yeah you just need to renew try the transaction and then yeah it will be in the clean state so yeah in this case then all the dependencies are clearly tracked so no matter where this task goes you will get a new producer so that that means like this for example tasks you somewhere else will take this task ID then I don't need to worry about like whether someone else is working on that because once they reinitialize a transaction it will actually be either commit or a report yeah and then the epoch will be bumped so it should be safe so crannies problem saw but we haven't solved the skipper the issue here as this same diagram for producer you always have like separate buffer poles maintaining memories and you accumulate data for network transfer to enable like the produce to to the broker so one of the most important design for producers to patch the our request to the broker so if you separate all the producer instances for your input partitions then there's no more batching and yeah and you are you're trading off with more like Iowa as more sockets connection to broker so those are the overhead of turning on the exact ones so we always want to go back to the way we have which is like for one stream thread we should only use one producer but currently we're trapped in this yeah cementing problem so to find the fix or the solution we need to be pretty much clear of our go here so first one is of course you need to find the dependencies across generation and then the other thing is you still you so you can detect any zombie when you actually turn on this transactional a producer group okay so first let's say when we when we need to at the transaction every stream strat they they will pump up a cup of transaction for tasks producers so that they can initialize so we we hash the transaction ID to find the correct transient coordinator so each of the thread can talk to different transit coordinators this is something distributed so you you don't have like a global view of your cup of a stream group like how everyone is doing what the assignment you get for each of this individual producers but if you take a deeper look into the graph we haven't talked about one important role here which is just the certain topic Jason talked about is actually the consumer offset so there's another coordinator coordinator maintaining the state of the consumer group and the the discovery of consumer the group coordinator is based on your hashing of the group ID so all the thread will talk to the same group coordinator and of course group coordinator knows what partition assignments you're giving to each of the thread so you actually have something centralized to track all the states which makes this is a good place to do the dependency resolution so the other thing is about the zombie case I think we talked about using producer epoch to defense some of the auto sync client but in this new model if we have like producers per thread it's no longer working that way because for example in this case we have some thread p1 start a transaction and it will write you rewrite offset to this coordinator and then because this is the first time in right so you will accept it and remember epoch for p1 and then somehow you rebalance this task and then t2 starts another transaction session and then it will write another offset so this one also cannot be affected because yeah it's just a good good good PID and epoch we never seen it before if somehow p1 goes back online this time you have no guarantee that the transaction of that p1 has been aborted before because no one used he was transaction ID anymore and if we are luckily within the transition timeout you don't you don't bump the epoch anyway so this officer commits can give you 30 data into the consumer offsets so that's the tricky part of allowing like the assignments going in between the threads because you don't have like a unique ID to help you to fence any of the last generation transactions so after we know the problem let's talk about the solution which is currently ongoing in the community for keep 407 so first based on our reservation we're gonna use School Coordinator to solve this neutralization problem and just going back to the graph where we talked about this awkward failure of in between one transaction so in the current in the single rider model it actually is okay for you to maybe just going forward by ignoring all the pending offsets when we read the opposition topic but in the the new case here is no longer safe right because if say we can continue from maybe input B and then somehow this transaction on t1 hasn't finished and it actually are on the step maybe for example we already read all the data but we just haven't write the transaction marker or something but yeah yeah sorry if we already read the transaction that the prepare commit message through the transition lock you actually drove forward and then all of all of this data can be committed in the next step so you're actually reprocessing the data for B into twice for the prime B so that's pretty bad so instead of just proceed when we see pending offsets on the position topic what we should do is we can just hold and wait there so that we can wait for this t1 transaction to be wiped out by the transition coordinator because we always have a transaction time out to help us get rid of this pending offset because in the new model here single writer mechanism doesn't help us defect this pending offset so you eat no longer can make sure that this pen the offset will be will be gone anyway so you just take a conservative approach to back off and Q you reach the transaction timeout for t1 then all the pending offsets will be clear you can safely proceed so that's the fix to it so yeah sure sure go ahead yeah no my question is how do you really clear this state if this is like a right I have log we cannot really delete oh it's not deleting I think for all this pending records you will write a inventory right of marker saying whether this is something success or a fail so it's actually just by appending forward of the markers then this offset will no longer be something we call pending but it's something like committed so yeah so it's not like we were actually reading data from a catalog it's just just you write on another record saying whether this Oh [Music] so the the marker is right here on the position topic so all the topics we have cash for either just general output topic or consumer offset topic those are just we produce some data but we always mark them as pending so when you are in the pending stage if you configure consumer ESRI committed you never see those pending messages but and let and until you write like a clear marker to those topics then yeah those data will be marked as committed or aborted so that yeah you will either skip them or just return them to the consumer so yeah oh thanks for question yeah so let me get this really quick so I think the solution is pretty clear with just a couple observations with the consolidating of the group coordinator you basically relying on the transition timeout for this time to help you aboard the last generation transactions but just well the thing we are discussing here just worst case because in normal cases when you reverse or you're shut down you can definitely write a and transition request to the broker it just sometimes you're not ending that this quiz ending that not that gracefully that requires you to be pretty much conservative in this way so the other thing about the zombie I think we have talked about this we haven't talked about this notion on the consumer side which is if you know a little bit about the group rebalance their say status status marker called generation ID on a group coordinator so it actually remembers the group states during every single rebalance with your successful so if you use generation ID you can actually fence all those thing consumers and in this time we we realize that we can still use this notion so that we basically just we use the generation ID in the time we commit the offsets so that if you if you say oh this is a generation two it's matches what I have our own records so this is a good coin then you can always proceed when you rebalance the new threat heater the new client issued you will know the same new generation ID three it will still make good for us but for the zombie t1 is out of the game so you basically won't be able to make a dirty commit because your generation ID is auto sync so yeah this is just the defensing so you don't want to use the epoch because the single readers schema is not guaranteed but you can use the consumer group generation ID in this case so by all this works we're doing here eventually we want to reach the stage that we can bring back single single producer purse read model to come stream so that you can turn on easily once by default so that's the ultimate goal so just recap of everything we have talked about you always start from an insight transaction and in a new model here you you be conservative when you fetched offset by waiting until the abortion of the pending transactions and then you will fetch data from input topic you do some work and then produce them to the OP to a topic and this the time you need to make sure you know the offsets you're gonna write so you basically talk to the transit coordinator to remember the offset opposite you have written and then you will talk to you the group coordinator to commit offset and when you're committing you need to use generation ID to make sure you're safe to go you're not zombie and then once those steps have done and then you will just commit the transaction and then transfer corner will write a marker message in loco and then it will try to write the SSS marker to all the cached partitions oh sorry yeah sorry actually maybe it's not Impa topic but okay so last thing once you have done writing the transaction success markers to all the types of topics you will just finish this transaction so just summary will we have talked today so I think yeah basically this us model we developed two years ago is pretty elegant but the thing is we see you are having some gaps to reach the maturity and we are currently working on them hopefully eventually we can bring come straight into a more scalable stays and then we can enable easily once yeah by default okay questions so I have a question is a transaction coordinator and transaction log yes so where do those kind of its transaction log another Kafka topic yes yeah so there is that each topic which normal topic has one transition to our topic or like this each topic partition has one transaction mark oh the transaction log is actually based on like your discovery of the transaction coordinator array so it basically just you hash your transaction ID and then all the transaction on going with this transaction ID those markers sir sorry those log messages will be written to this partition so who is the leader of the transitional part partition yeah this year so just to add to what Boyan said you it's actually the transaction topic is a very very simple well it's similar to the onset table that we have for group coordinator you predefined the number of partitions for your transaction offset sorry for your transaction topic and then all the producers with the same transactional ID will always talk to the same calling their hence writing to the same transactional log but we based on the number of partitions you have more than one transaction law transaction IDs may actually talk to the same colony and I were to the same transaction law so if you have multiple input topics are they going can we go to the same translation log or the granularity is based on the transactional ID so if your transactional ID say another ID a is actually writing for multiple partitions it will all go to the same transactional log ok one more question thank you so in all this presentation you talked about basically dynamic partitioning assigning does it change on manual partition assignment at all the way oh no I mean yeah that's a good question actually consumer has to most I think most of you know like subscription or a sign so basically in a standalone mode the current model definitely works so it I think the thing I talk about the scalability is only on the yeah yeah so yeah they're totally different so my case it only applies to the subscription type yeah ok Thank You boyo and in case if you have more questions about the details of capes 360 and four four seven we have actually pretty you know well written documentation on the you know confluent confluence page of you know Kafka improvement proposals or aka kicks so you won't actually you can just go to the wiki pages of three sixty and four four seven for more details about how we actually want to improve on that and hopefully we'll get both of those features into the next release which is 12 for in end of and not September let's see okay

Info

Channel: Confluent

Views: 2,361

Rating: undefined out of 5

Keywords: confluent, apache kafka, meetup, streams, logs, hadoop, free, open source, streaming, platform, applications, apps, real-time, processing, data, multi-datacenter, monitoring, developers, operations, ops, java, .net, messaging queue, distributed, commit, fault-tolerant, publish, subscribe, pipelines, Jason Gustafson, Exactly Once Semantics, Kafka Streams

Id: j0l_zUhQaTc

Channel Id: undefined

Length: 47min 15sec (2835 seconds)

Published: Mon Jul 29 2019