Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash #121729

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jan 6, 2025

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Jan 6, 2025

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.


Full diff: https://github.com/llvm/llvm-project/pull/121729.diff

2 Files Affected:

  • (modified) lld/MachO/BPSectionOrderer.h (+16-17)
  • (modified) lld/include/lld/Common/BPSectionOrdererBase.h (-9)
diff --git a/lld/MachO/BPSectionOrderer.h b/lld/MachO/BPSectionOrderer.h
index 8ba911fcc546bd..3de815d79b0f4c 100644
--- a/lld/MachO/BPSectionOrderer.h
+++ b/lld/MachO/BPSectionOrderer.h
@@ -19,7 +19,10 @@
 #include "Symbols.h"
 #include "lld/Common/BPSectionOrdererBase.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Endian.h"
+#include "llvm/Support/xxhash.h"
 
 namespace lld::macho {
 
@@ -91,22 +94,20 @@ class BPSectionMacho : public BPSectionBase {
     constexpr unsigned windowSize = 4;
 
     // Calculate content hashes
-    size_t dataSize = isec->data.size();
-    for (size_t i = 0; i < dataSize; i++) {
-      auto window = isec->data.drop_front(i).take_front(windowSize);
-      hashes.push_back(xxHash64(window));
-    }
+    ArrayRef<uint8_t> data = isec->data;
+    for (size_t i = 0; i <= data.size() - windowSize; i++)
+      hashes.push_back(llvm::support::endian::read32le(data.data() + i));
 
     // Calculate relocation hashes
     for (const auto &r : isec->relocs) {
-      if (r.length == 0 || r.referent.isNull() || r.offset >= isec->data.size())
+      if (r.length == 0 || r.referent.isNull() || r.offset >= data.size())
         continue;
 
       uint64_t relocHash = getRelocHash(r, sectionToIdx);
       uint32_t start = (r.offset < windowSize) ? 0 : r.offset - windowSize + 1;
       for (uint32_t i = start; i < r.offset + r.length; i++) {
-        auto window = isec->data.drop_front(i).take_front(windowSize);
-        hashes.push_back(xxHash64(window) + relocHash);
+        auto window = data.drop_front(i).take_front(windowSize);
+        hashes.push_back(xxh3_64bits(window) ^ relocHash);
       }
     }
 
@@ -124,19 +125,17 @@ class BPSectionMacho : public BPSectionBase {
     std::optional<uint64_t> sectionIdx;
     if (auto it = sectionToIdx.find(isec); it != sectionToIdx.end())
       sectionIdx = it->second;
-    std::string kind;
+    uint64_t kind = -1, value = 0;
     if (isec)
-      kind = ("Section " + Twine(isec->kind())).str();
+      kind = uint64_t(isec->kind());
 
     if (auto *sym = reloc.referent.dyn_cast<Symbol *>()) {
-      kind += (" Symbol " + Twine(sym->kind())).str();
-      if (auto *d = llvm::dyn_cast<Defined>(sym)) {
-        return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0),
-                                           d->value, reloc.addend);
-      }
+      kind = (kind << 8) | uint8_t(sym->kind());
+      if (auto *d = llvm::dyn_cast<Defined>(sym))
+        value = d->value;
     }
-    return BPSectionBase::getRelocHash(kind, sectionIdx.value_or(0), 0,
-                                       reloc.addend);
+    return llvm::stable_hash_combine(kind, sectionIdx.value_or(0), value,
+                                     reloc.addend);
   }
 };
 
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.h b/lld/include/lld/Common/BPSectionOrdererBase.h
index e2cb41f69cc684..29599afa03bd40 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.h
+++ b/lld/include/lld/Common/BPSectionOrdererBase.h
@@ -18,7 +18,6 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
-#include "llvm/Support/xxhash.h"
 #include <memory>
 #include <optional>
 
@@ -56,14 +55,6 @@ class BPSectionBase {
     return P1;
   }
 
-  static uint64_t getRelocHash(llvm::StringRef kind, uint64_t sectionIdx,
-                               uint64_t offset, uint64_t addend) {
-    return llvm::xxHash64((kind + ": " + llvm::Twine::utohexstr(sectionIdx) +
-                           " + " + llvm::Twine::utohexstr(offset) + " + " +
-                           llvm::Twine::utohexstr(addend))
-                              .str());
-  }
-
   /// Reorders sections using balanced partitioning algorithm based on profile
   /// data.
   static llvm::DenseMap<const BPSectionBase *, size_t>

hashes.push_back(xxHash64(window));
}
ArrayRef<uint8_t> data = isec->data;
for (size_t i = 0; i <= data.size() - windowSize; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sections shorter than 4 bytes are trivial functions that are likely
foled by ICF.

We should still guard against the case when data.size() < windowSize because these sections could be data sections in the future

}
ArrayRef<uint8_t> data = isec->data;
for (size_t i = 0; i <= data.size() - windowSize; i++)
hashes.push_back(llvm::support::endian::read32le(data.data() + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes how we take hashes at the end of the section, but it could be a change for the better. I'm testing this PR on our apps to see if there is a size regression or not

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Created using spr 1.3.5-bogner
.
Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Jan 7, 2025

Only very minor size changes on my end. I'm curious if @Colibrow sees and size regressions. LGTM after fixing the data.size() < windowSize issue.

Thanks. I've changed the tail hashing scheme to:

    if (data.size() >= windowSize)
      for (size_t i = 0; i <= data.size() - windowSize; ++i)
        hashes.push_back(llvm::support::endian::read32le(data.data() + i));
    for (uint8_t byte : data.take_back(windowSize - 1))
      hashes.push_back(byte);

This helps group 0102 and 0201 together. The similarity between 030201 and 0201 is now the same as 030201 and 0102, which should be fine.

ArrayRef<uint8_t> data = isec->data;
if (data.size() >= windowSize)
for (size_t i = 0; i <= data.size() - windowSize; ++i)
hashes.push_back(llvm::support::endian::read32le(data.data() + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using read32le() here and xxh3_64bits() for relocations below? As I understand, read32le() only works here because the window size is exactly 4. I chose this window size because it gave the best results on a few binaries, but other window sizes could work better for other scenarios. If we use xxh3_64bits() in both cases, we are free to change windowSize.

Copy link
Member Author

@MaskRay MaskRay Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is weird. xxh3_64bits(window) for relocation hashing is just because it's easy: no need to handle the shorter-than-4-bytes case.

Hmmm. Reloc::length is actually a logarithm field. For Mach-O arm64, the relocation offsets are aligned to start of the instruction. Shall we compute one single hash for a relocation? I guess the sliding window doesn't help, but happy to be proven wrong.

Copy link
Contributor

@Colibrow Colibrow Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file size gzipped size
libsample.so 3,181,560 1,487,043
libsample.so_xxh3 3,181,544 1,486,208

It seems good although the change is minor. The object file size change confused me. Do you have any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants