Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: generalise license checker for other controlled-vocabulary fields #3714

Open
tiborsimko opened this issue Jan 6, 2025 · 0 comments
Open

Comments

@tiborsimko
Copy link
Member

Current behaviour

Recently, the license string checker helper script was added to CI procedures, see #3699.

The script checks the values of license.attribution field values and makes sure that the values match a desired controlled vocabulary of allowed values.

This works very well; however there is nothing too specific about licenses here, so the script could be easily generalised to check the values of other fields where we would like to make sure the values are from a controlled vocabularies, such as collision type, MC categories, etc.

Possible improvements

Let's generalise the license checker script to allow checking more metadata fields in the similar way.

For example, one could introduce a configuration file listing fields and their desired values to check against, such as:

license:
  attribution:
    - Apache-2.0
    - BSD-3-Clause
    - CC0-1.0
    - GPL-3.0-only
    - MIT
collision_information:
  type:
    - e+e-
    - pp
    - pPb
    - PbPb
experiment:
  - ALICE
  - ATLAS
  - CMS
  - DELPHI
  - LHCb
  - OPERA
  - PHENIX
  - TOTEM

The content curators could define all the controlled vocabularies of interest, and the script would check all the fields and subfields values to see whether they match, and report any problems.

Notes

The above YAML configuration file example was listed just for illustration purposes; the actual implementation could use any other technique, for example JSON Schema with embedded enum types:

  "enum": [
    "Apache-2.0",
    "BSD-3-Clause",
    "CC0-1.0",
    "GPL-3.0-only",
    "MIT",
  ]

Advantage: fast JSON Schema validators exist, nothing to write on our end. Disadvantage: We would have to update metadata schema versions for each newly-allowed controlled vocabulary value, leading to a bit of a version jungle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant