Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Cache): two small features #20567

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

feat(Cache): two small features #20567

wants to merge 4 commits into from

Conversation

grunweg
Copy link
Collaborator

@grunweg grunweg commented Jan 8, 2025

  • add a "miniget" command, which will find all .lean files in the current directory
    (excluding .lake directorires), read all "import Mathlib X.Y.Z" lines in them,
    and run lake exe cache get X/Y/Z.lean for these lines
    In short, this just downloads the cache you need for a certain project.

  • add an informational message to the get command (when run without arguments):
    when more than 80% of all cache files present were not used in the last command,
    tell the user about the clean command

  • document two definitions


Open in Gitpod

This downloads only cache files for every Lean file in some subdirectory.
@grunweg grunweg requested a review from arthurpaulino January 8, 2025 13:21
Copy link

github-actions bot commented Jan 8, 2025

PR summary 2cde5e5336

Import changes for modified files

No significant changes to the import graph

Import changes for all files
Files Import difference

Declarations diff

No declarations were harmed in the making of this PR! 🐙

You can run this locally as follows
## summary with just the declaration names:
./scripts/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/declarations_diff.sh long <optional_commit>

The doc-module for script/declarations_diff.sh contains some details about this script.


No changes to technical debt.

You can run this locally as

./scripts/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@Julian
Copy link
Collaborator

Julian commented Jan 8, 2025

If a user has 2 active projects on two very different versions of Lean and each depending on Mathlib (but then of course on 2 very different versions of Mathlib), the 80% overlap is likely to continuously get printed as they work on the 2 projects right?

I'm not saying this is inherently bad, maybe it's motivation to update the older of the 2 projects, really just trying to check my understanding of the PR.

if 5 * currentHits < numberAllCacheFiles then
IO.println "note: currently, more than 80% of all files in the cache are not used \
(they might be useful for other versions of mathlib, or orphaned): \
to increase disk space, run cache clean to delete these files"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have 6 disjoint caches of Mathlib, all of which I'm actively using in different project/branches, I will get this message? That doesn't seem useful, since that is almost always the situation I'm in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the racecar emoji...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That question (about the cache location for different projects) is a good one: I'm not sure what the answer is.
@arthurpaulino Do you know more/can you help here, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest also looking at file atime to see whether large portions of the cache were stale but I know atime behavior can be unreliable depending on the filesystem so I'm not sure how well it'd work in practice, but might be worth checking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code: it seems like all ltar files are saved in one location (namely CACHEDIR), and in particular shared between projects. This check is not about (and does not touch) olean files.

@grunweg
Copy link
Collaborator Author

grunweg commented Jan 8, 2025

To perhaps answer the main question: the code checks for "80% stale files" - i.e., if I just get the cache on one new and one very old version of mathlib, each run should have something like 50% known files, far from the threshold. Using the cache only on mathlib with little orphaned files is unlikely to trigger this warning.

I'm happy to crank up the threshold further, if that's the question.

@grunweg
Copy link
Collaborator Author

grunweg commented Jan 8, 2025

@arthurpaulino Do you know why cleanCache only removes ltar files, and never (e.g.) oleans? Should this change?

@arthurpaulino
Copy link
Collaborator

@arthurpaulino Do you know why cleanCache only removes ltar files, and never (e.g.) oleans? Should this change?

I don't see a reason to keep orphan oleans. @digama0 was the one who introduced the custom compression part so maybe he has more context.

@grunweg
Copy link
Collaborator Author

grunweg commented Jan 8, 2025

I've split the miniget changes into a separate PR: it seems the warning needs more discussion/a better design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants