-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-14212. Make maxFailovers in S3g apply on a per-request basis. #9546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Gargi-jais11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @greenwich for working on this patch.
Left some comments.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java
Outdated
Show resolved
Hide resolved
|
@greenwich your added test cases looks great, but we can have test for below case as well in
Why we need this? Test Scenario:
Without this test: We rely on inference that local variables reset, but don't explicitly verify the fix works for sequential requests. |
@Gargi-jais11, is it possible you missed |
@greenwich You're absolutely right! I missed that |
Gargi-jais11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @greenwich for updating the patch. Overall LGTM. Just some minor comments.
What changes were proposed in this pull request?
This PR makes
maxFailoversandfailoverCountin S3g apply on a per-request basis.Rationale:
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.javaline 93: we haveprivate int failoverCount = 0;- All threads share this counter; it never resets.GrpcOmTransport.shouldRetry(258) we runaction = retryPolicy.shouldRetry((Exception)ex, 0, failoverCount++, true);which is also shared between requests.OMFailoverProxyProviderBase.getRetryPolicy.getRetryAction, we still use that globalfailoverCountcheckingif (failovers < maxFailovers)(258), which always returnsreturn RetryAction.FAIL;(263) once we reached themaxFailoversmaxFailoversfrom above is defined byOZONE_CLIENT_FAILOVER_MAX_ATTEMPTS_KEYI propose to change the value of
failoverCountper request, rather than making it a global flag. So basically, it makesOZONE_CLIENT_FAILOVER_MAX_ATTEMPTS_KEYconfiguration to serve on a per-request basis.Detailed explanation and discussion are here: #9477
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14212
How was this patch tested?
Branch pipeline: https://github.com/greenwich/ozone/actions/runs/20430980506
failoverCountis set for each requestom0toom2. It reproduces the issue, then I used it to test the fix.Concurrent test results before the fix - demonstrating the bug -> failover didn't happen to om2
Concurrent test results after the fix - demonstrating the correct behaviour -> failover to om2