-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17384. [FGL] HDFS NameNode FGL Phase I #6762
base: trunk
Are you sure you want to change the base?
Conversation
...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me.
I think we can merge it after getting the performance benchmark if no other comments.
if (lockMode.equals(FSNamesystemLockMode.GLOBAL)) { | ||
if (this.fsLock.isWriteLockedByCurrentThread()) { | ||
// The bm writeLock should be held by the current thread. | ||
assert this.bmLock.isWriteLockedByCurrentThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that hasWriteLock has two way to used:
- (1) Make sure we hold a lock. like
assert fsn.hasReadLock(FSNamesystemLockMode.GLOBAL);
- (2) Check whether there is a lock to determine whether to perform some action. like
boolean hadFsnWriteLock = fsn.hasWriteLock(FSNamesystemLockMode.GLOBAL);
Here add assert
, may throw AssertionError. So I think the new hasWriteLock
only have the semantics described in (1).
I think we need to separate two approaches to handle (1) and (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @zhengchenyu , for your review.
Regarding point (2), are you suggesting that the hasWriteLock(RwLockMode lockMode)
method should strictly return true
or false
and never throw an AssertionError
?
Indeed, hasWriteLock(RwLockMode lockMode)
is intended to always return true
or false
. However, I included the check to identify potential issue with lock ordering, because the caller often expects to receive a true
result in NameNode.
If this additional check is deemed unnecessary, the code could simply be:
if (lockMode.equals(RwLockMode.GLOBAL)) {
return this.fsLock.isWriteLockedByCurrentThread() && this.bmLock.isWriteLockedByCurrentThread();
}
Looking forward your ideas, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7250 is used to fix this case. @zhengchenyu please help review it. Thanks
I know this PR is phase 1 work: split global lock into fslock and bmlock. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sir, leave some small comments.
@@ -2058,7 +2080,7 @@ BatchedListEntries<OpenFileEntry> listOpenFiles(long prevId, | |||
|
|||
public BatchedListEntries<OpenFileEntry> getFilesBlockingDecom(long prevId, | |||
String path) { | |||
assert hasReadLock(); | |||
assert hasReadLock(RwLockMode.FS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use GLOBAL mode here, because below codes contain iterate blockManager.getDatanodeManager().getDatanodes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hfutatzhanghb for your review.
Both dataNode.getLeavingServiceStatus().getOpenFiles()
and blockManager.getDatanodeManager().getDatanodes())
are already thread-safe, so the BMLock is unnecessary here.
@@ -2031,7 +2053,7 @@ BatchedListEntries<OpenFileEntry> listOpenFiles(long prevId, | |||
BatchedListEntries<OpenFileEntry> batchedListEntries; | |||
String normalizedPath = new Path(path).toString(); // normalize path. | |||
try { | |||
readLock(); | |||
readLock(RwLockMode.FS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use GLOBAL mode here, because it invoke getFilesBlockingDecom method.
Thanks, @zhengchenyu . This is a good idea. Let's briefly consider some common scenarios:
In scenarios 3, 4, and 5, the BM lock already held is normal when acquiring FS lock. However, in scenarios 1 and 2, the BM lock already held is abnormal when acquiring FS lock. Indeed, lock reentry can also be checked, but I think it’s unnecessary as it’s overly complicated. We can use other methods to detect deadlock issues. |
💔 -1 overall
This message was automatically generated. |
* HDFS-17387. [FGL] Abstract the configurable locking mode
💔 -1 overall
This message was automatically generated. |
Failed tests seem to be unrelated to FGL. In particular, hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped failure was caused by HDFS-17080 and hadoop.hdfs.server.federation.store.driver.TestStateStoreFileSystem by HDFS-17496 edit: created HDFS-17706 and HDFS-17707 to track the failed UTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
LGTM. +1. |
💔 -1 overall
This message was automatically generated. |
What's the purpose of this feature?
What changes were proposed in this pull request?
How was this patch tested?