keyrange_success payload

As I understand it, `keyrange_success` should be followed by semicolon separated triplets including the hash endpoints + the server address/port. Given that the server hash is derived from the address/port, the included hashes seem unnecessary.

Are we required to actually populate these fields, even if we do not use them? I assume we are given the same freedoms as with M1 in terms of developing our protocol -- apart from any cases covered by the auto tester.

the students' answer,

where students collectively construct a single answer

the instructors' answer,

where instructors collectively construct a single answer

Hashing the server address and port will only provide us with the endpoint of the key range of the server however we also need to know the starting point of the key range to accurately determine the responsible server.

So, yes, you should include these fields in the keyrange_success message.

followup discussionsfor lingering questions and comments

Resolved

Unresolved

Henry Tu

12 months ago

> however we also need to know the starting point of the key range to accurately determine the responsible server.

Is this really the case though? If we have a list of all the servers currently online, we can simply hash them and use the position of the predecessor to compute the range.

Luke
12 months ago

I think you are right, the hashing part can be calculated with just the server port part. So we can just send:

server1:port1;server2:port2;

instead of

kr-from1,kr-to1,server1:port1; kr-from2,kr-to2,server2:port2;

The kr part can be calculated and looks redundant. It can be omitted.

However, the autotester may enforce a check on that part. And you may not pass the case if you do not have it.

If we go one step further and think about the real world case which requires balanced sharding. The kr-from kr-to may not be a calculated value from serveraddr1:port. Imagine in Milestone99, we need to implement balanced sharding. If we previously have this structure in M2, we can easily swap that part in.

https://shopify.engineering/mysql-database-shard-balancing-terabyte-scale

On the other hand, keyrange_success is supposed to be cached and isn't transferred frequently. There is little to lose if we append extra data to this request.

Henry Tu
12 months ago

> However, the autotester may enforce a check on that part. And you may not pass the case if you do not have it.

Yes, failing the auto tester is a clear concern, and I'll be sure to verify it passes when the time comes. In my current implementation I simply toss out the first two entries of the triplet and compute them using the host:port information to work with the data structure I've selected.

> If we go one step further and think about the real world case which requires balanced sharding.

Sure, including this information in the protocol now might help integrating features like that in the future, but I'd argue that this could introduce some additional sources of error/instability as a result of redundant data (e.g. key not in any range, malformed range, overlapping ranges, etc.).

I do of course agree with your point that we should be thinking about future extensions when architecting the protocol, so I can see why one might want to include this.

Start a new followup discussion