question history:
question 
75 views

keyrange_success payload

As I understand it, `keyrange_success` should be followed by semicolon separated triplets including the hash endpoints + the server address/port. Given that the server hash is derived from the address/port, the included hashes seem unnecessary.

Are we required to actually populate these fields, even if we do not use them? I assume we are given the same freedoms as with M1 in terms of developing our protocol -- apart from any cases covered by the auto tester.

 0
Updated by Henry Tu

the students' answer,

where students collectively construct a single answer

the instructors' answer,

where instructors collectively construct a single answer

Hashing the server address and port will only provide us with the endpoint of the key range of the server however we also need to know the starting point of the key range to accurately determine the responsible server. 

So, yes, you should include these fields in the keyrange_success message.

 1
Updated by Jawad Tahir
followup discussionsfor lingering questions and comments
Henry Tu
 

> however we also need to know the starting point of the key range to accurately determine the responsible server. 

Is this really the case though? If we have a list of all the servers currently online, we can simply hash them and use the position of the predecessor to compute the range.

 0
Luke

I think you are right, the hashing part can be calculated with just the server port part. So we can just send:

server1:port1;server2:port2;

instead of

kr-from1,kr-to1,server1:port1; kr-from2,kr-to2,server2:port2;

The kr part can be calculated and looks redundant. It can be omitted.

However, the autotester may enforce a check on that part. And you may not pass the case if you do not have it.

If we go one step further and think about the real world case which requires balanced sharding. The kr-from kr-to may not be a calculated value from serveraddr1:port. Imagine in Milestone99, we need to implement balanced sharding. If we previously have this structure in M2, we can easily swap that part in.

https://shopify.engineering/mysql-database-shard-balancing-terabyte-scale

On the other hand, keyrange_success is supposed to be cached and isn't transferred frequently. There is little to lose if we append extra data to this request.

 0
Henry Tu

> However, the autotester may enforce a check on that part. And you may not pass the case if you do not have it.

Yes, failing the auto tester is a clear concern, and I'll be sure to verify it passes when the time comes. In my current implementation I simply toss out the first two entries of the triplet and compute them using the host:port information to work with the data structure I've selected.

> If we go one step further and think about the real world case which requires balanced sharding.

Sure, including this information in the protocol now might help integrating features like that in the future, but I'd argue that this could introduce some additional sources of error/instability as a result of redundant data (e.g. key not in any range, malformed range, overlapping ranges, etc.).

I do of course agree with your point that we should be thinking about future extensions when architecting the protocol, so I can see why one might want to include this.

 1