-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Cache): two small features #20567
base: master
Are you sure you want to change the base?
Conversation
This downloads only cache files for every Lean file in some subdirectory.
…ll cache files were unused
PR summary 2cde5e5336Import changes for modified filesNo significant changes to the import graph Import changes for all files
Declarations diffNo declarations were harmed in the making of this PR! 🐙 You can run this locally as follows## summary with just the declaration names:
./scripts/declarations_diff.sh <optional_commit>
## more verbose report:
./scripts/declarations_diff.sh long <optional_commit> The doc-module for No changes to technical debt.You can run this locally as
|
If a user has 2 active projects on two very different versions of Lean and each depending on Mathlib (but then of course on 2 very different versions of Mathlib), the 80% overlap is likely to continuously get printed as they work on the 2 projects right? I'm not saying this is inherently bad, maybe it's motivation to update the older of the 2 projects, really just trying to check my understanding of the PR. |
if 5 * currentHits < numberAllCacheFiles then | ||
IO.println "note: currently, more than 80% of all files in the cache are not used \ | ||
(they might be useful for other versions of mathlib, or orphaned): \ | ||
to increase disk space, run cache clean to delete these files" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have 6 disjoint caches of Mathlib, all of which I'm actively using in different project/branches, I will get this message? That doesn't seem useful, since that is almost always the situation I'm in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the racecar emoji...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That question (about the cache location for different projects) is a good one: I'm not sure what the answer is.
@arthurpaulino Do you know more/can you help here, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest also looking at file atime to see whether large portions of the cache were stale but I know atime behavior can be unreliable depending on the filesystem so I'm not sure how well it'd work in practice, but might be worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code: it seems like all ltar files are saved in one location (namely CACHEDIR
), and in particular shared between projects. This check is not about (and does not touch) olean
files.
To perhaps answer the main question: the code checks for "80% stale files" - i.e., if I just get the cache on one new and one very old version of mathlib, each run should have something like 50% known files, far from the threshold. Using the cache only on mathlib with little orphaned files is unlikely to trigger this warning. I'm happy to crank up the threshold further, if that's the question. |
@arthurpaulino Do you know why |
I don't see a reason to keep orphan |
I've split the |
add a "miniget" command, which will find all .lean files in the current directory
(excluding .lake directorires), read all "import Mathlib X.Y.Z" lines in them,
and run
lake exe cache get X/Y/Z.lean
for these linesIn short, this just downloads the cache you need for a certain project.
add an informational message to the get command (when run without arguments):
when more than 80% of all cache files present were not used in the last command,
tell the user about the
clean
commanddocument two definitions