You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built an OS image on atrium (which is on heliosv2) today, using the omicron shell script, which wraps running gmake setup and then helios-build experiment-image .... After installing the OS on madrid, it had a hostname of unknown; the logs for the compliance/hostname service revealed the cause:
++ pilot gimlet info -i
ld.so.1: pilot: fatal: libssl.so.1.1: open failed: No such file or directory
Looking in my helios working directory, the pilot binary is quite old (~2 weeks) and predates the heliosv2 upgrade:
Yeah this is a somewhat unfortunate artefact of the way the OpenSSL dependency is determined, for at least two reasons:
we're building pilot today during gmake setup, which is appropriate for build tools that run on the build host like the image construction stuff, but not always appropriate for things that we ship to run on the target system; more concretely:
bad: if the shipped binary uses a private interface
bad: if the shipped binary uses a public, but unstable, interface, as is effectively what happened here with OpenSSL: we did a hard break from 1.1 to 3.0 and we dropped the old library to avoid accidentally continuing to use it, etc
ok: if the shipped uses only public interfaces, and the built image is using the same OS bits as the build machine, or will be using newer bits, then things are generally OK because we make strong backwards compatibility guarantees in the OS (it is hard to know if you are only using public interfaces at a glance of course)
even if we were rebuilding pilot against the assembled ramdisk root (or at least a sysroot with analogous packages, including the headers and compilation links that we chuck out of the ramdisk itself, etc) I'm not sure cargo build would have noticed that the OpenSSL version changed, because I think it gets cached by the build.rs business in the crate with the bindings; this implies we would have to always cargo clean which is pretty unfortunate as it would explode the time taken to create an image significantly
Fortunately, I believe there is another way! For files that are built during the OS build, or for any number of other third party components (e.g., PostgreSQL libraries) we enumerate and record dependencies when packaging up the resultant files. This enables us to decouple parts of the build: we can built pilot binaries once, that used the correct OpenSSL packages, and then publish them into the repository. When installed in the ramdisk, we'll pull in the correct OpenSSL, or fail to assemble the image because it cannot be pulled in. Another benefit is that gmake setup will take less time because we don't need to rebuild all the things all the time.
I will look at moving the pilot build into something we can shove into the package repository and adjusting the process here. In the meantime, if you blow away your entire helios and start fresh that will definitely get you into a better place after the Helios 2.0 switch yes.
I built an OS image on
atrium
(which is on heliosv2) today, using the omicron shell script, which wraps runninggmake setup
and thenhelios-build experiment-image ...
. After installing the OS onmadrid
, it had a hostname ofunknown
; the logs for the compliance/hostname service revealed the cause:Looking in my
helios
working directory, thepilot
binary is quite old (~2 weeks) and predates the heliosv2 upgrade:I expect I can work around this by manually cleaning
projects/pilot
(or probably safer: blowing away myhelios
and starting fresh?).The text was updated successfully, but these errors were encountered: