[jan 21 merge] feat(blog): blog post--lessons learned on publish security #538

lwasser · 2024-12-14T01:56:10Z

This is the start of a draft post on publishing to pypi best practices. it should be super duper user friendly and easy to read. I will need feedback!!

i will be working on some graphics and such next.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208969693341559
- https://app.asana.com/0/0/1209091375879191

lwasser · 2024-12-14T01:59:49Z

_posts/2024-12-13-python-packaging-security.md

+          python-version: '3.9'
+
+      # Step 3: Cache dependencies
+      - name: Cache dependencies


This is not tested and could be wrong.

This is one of the uses but oftentimes, building is performed by pypa/build so it'd be just pip install build and not multiple direct deps as one might expect from a requirements.txt file. So the example seems a bit unrealistic, except for some untypical projects with surprising build automation required.

oh, right, good point. What is a good example of caching dependencies (other than what we say with ultranalytics)? I use caching when i'm publishing.

we pip install hatch and use hatch build

What do you think about removing the example and just telling people that if they do cache dependencies for any reason, don't do it when the build is for publishing?

for instance, doing it for a website build could be ok if it's a preview, etc. vs publishing a package?

_posts/2024-12-13-python-packaging-security.md

lwasser · 2024-12-17T18:54:54Z

hey @webknjaz @hugovk do you have a bit of time to review this post for accuracy? i'm trying to provide a more beginner-friendly overview of what we learned that our community can quickly understand and implement.

We use the the workflow @webknjaz helped us create for the pyosmeta package. And we want to link to all of the pypi / psf posts that are relevant too. many thanks for your time!

webknjaz

Here's some quick notes. I might've missed something.

Let's also ask @woodruffw for feedback.

webknjaz · 2024-12-18T01:04:58Z

_posts/2024-12-13-python-packaging-security.md

+
+## Is your PyPI publication workflow secure?
+
+The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.


a GitHub workflow (pull_request_target)

pull_request_target is not a workflow, it's one of the events that may trigger a workflow run. A workflow may “subscribe” to different events like push and pull_request, this is just another event type with elevated privileges in its security context.

unknowingly allowed their machines to be hijacked for Bitcoin mining.

Not only that, but they also did that in Google Colab which got their accounts banned pretty swiftly...

OOOOH no way. i missed the Google Colab part!! that's a pretty bold move for a hacker? maybe they were new hackers ?

See googlecolab/colabtools#4979.

It was unwitting end users who installed the hacked package on their Colab accounts, and the Google automation then suspended their accounts:

Connection failed
This account has been blocked from accessing Colab runtimes due to suspected abusive activity. This does not impact access to other Google products. If you believe this has been done in error, review the usage limitations and file an appeal.

Google was pretty quick in restoring accounts, but it happened to a lot of accounts: googlecolab/colabtools#4979 (comment)

geez. it's amazing how quick and far-reaching this single incident was!

_posts/2024-12-13-python-packaging-security.md

webknjaz · 2024-12-18T01:25:15Z

_posts/2024-12-13-python-packaging-security.md

+
+### 🔐 Secure your workflows 🔐
+- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.
+- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering.


That's probably an oversimplification. Not caching the deps may as well lead to something like this if the transitive deps are unpinned and get updated in uncontrolled manner. The problem was that the cache was poisoned in a job with a security context that had high privileges and then, another job picked it up, effectively reusing what that first one “pinned” in the installed deps.

But if there's no cache, a compromised transitive dependency can be downloaded from PyPI, resulting in the same problem.

Security exists in context and not in isolation. It should come from a place of understanding how different bits fit together.

That said, I wouldn't make a call of whether using caching or not will save us all. The problem was that it was possible to compromise the cache that ended up being used.

I think it's possible to use cache securely by having separate cache keys unique to the dist building job. So caches for building the dists and testing them would not leak into each other. That's the real fix for cache.

Another side of this is the supply chain problem, which cache ended up being a part of in this instance. But as I mentioned earlier, the supply chain could be compromised differently, without cache being a factor. The solution to this is pinning your dependencies. This means recording every single version number for every single transitive dependency there is. One of the low-level tools to use for this is pip-tools, in the context of pip install (until we have PEP 751, that is). Non-pip installers have different mechanisms. With pip, you can use pip install -r requirements.in -c constraints.txt, where requirements.in would contain a direct and unrestricted dependency list, and constraints.txt would set the entire tree to exact versions. A complete solution is usually integration of different aspects through multiple tools (or a "one-tool-to-rule-them-all" which some projects aim to become). I've been taking this to extremes with "DIY lock files" in https://github.com/webknjaz/ep2024-multilock-toy-template / https://github.com/ansible/awx-plugins/tree/2c2b22c/dependencies/lock-files.
Additionally, there's --require-hashes which also records every single dist file hash, not just version (this is not compatible with -c, though, due to a bug in pip).

To summarize, the solution here is to be in control of your deps, make sure their updates are transparent and controllable on all the levels, and to know what influences them… Out of the context, disallowing caching is a band-aid at best. But it may be harmful in that it would cultivate a false sense of security.

ok. This is super helpful. I could remove the caching part altogether, given so many moving pieces and caching seems more nuanced then just locking down the workflow itself with better triggers and tighter permissions. But I also have a question about the dependencies. Because it is my understanding and experience that pinning all deps can also be highly problematic for users. For instance, if you have an environment with many different packages and try installing a package with pinned dips, that could lead to significant user issues.

Unless we're just talking about dips in the workflow itself, such as hatch, pypa/build, etc.

web apps make sense to pin but i can't quite wrap my head around pinning deps for a package. Could you please help me understand? And do you agree - maybe for more beginner users we just drop caching altogether as a topic?

webknjaz · 2024-12-18T01:32:27Z

_posts/2024-12-13-python-packaging-security.md

+### 🔐 Secure your workflows 🔐
+- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.
+- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering.
+- If you reference branches in a pull request, clean or sanitize branch names in your workflow.


Perhaps, link https://docs.github.com/en/get-started/using-git/dealing-with-special-characters-in-branch-and-tag-names and recommend that people use https://woodruffw.github.io/zizmor/.

And maybe https://securitylab.github.com/resources/github-actions-untrusted-input/.

Great suggestions!! I may add these links lower down in the post at the end, so the takeaways are just text.. but I'll definitely include them. This is what I have now:

If you reference branches in a pull request, clean or sanitize branch names in your workflow.

Consider using zizmor to check your GitHub workflows for security vulnerabilities!

i haven't used zizmor yet but i saw a package that Hugo maintains uses it so i may try it out with stravalib and pyosmeta!!

webknjaz · 2024-12-18T01:33:46Z

_posts/2024-12-13-python-packaging-security.md

+
+### Lock down GitHub repo access
+- 🔒 Restrict repository access to essential maintainers only.
+- ✅ Add automated checks to ensure releases are authorized and secure.


I think this is mostly about setting an environment in the job and making sure that environment has required reviewers in the repo settings.

Looking at the blog posts, it seemed like the hackers somehow got direct access to that repo. Did i read that wrong?

webknjaz · 2024-12-18T02:24:01Z

_posts/2024-12-13-python-packaging-security.md

+
+* [<i class="fa-brands fa-discourse"></i> Discourse](https://pyopensci.discourse.group/)
+* [<i class="fa-brands fa-mastodon"></i> Mastodon](https://fosstodon.org/@pyopensci)
+* [<i class="fa-solid fa-cloud"></i> Bluesky](https://bsky.app/profile/pyopensci.bsky.social)


Tip
Pro tip: consider updating the Bluesky username to pyopensci.org. This can be configured and verified in like 3 minutes (through either DNS or a .well-known document on the website, but adds credibility to the account). You can create a repository in the org, call in .well-known, and add a file with the account DID as follows: webknjaz/.well-known@5a3c310. Just don't forget to enable Pages in the repo settings before clicking Verify.

WOAH!! i never knew about this for bluesky!.
thank you for this tip - and for all of the thoughtful feedback. i'm going to work on this some more tomorrow!

_posts/2024-12-13-python-packaging-security.md

webknjaz · 2024-12-18T02:36:51Z

_posts/2024-12-13-python-packaging-security.md

+
+You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach.
+
+**Note:** Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned!


In the doc you already linked, there's an example for GitLab as well: https://docs.pypi.org/trusted-publishers/adding-a-publisher/#gitlab-cicd. There are 4 platforms that support this mechanism already, and I saw some Bitbucket folks asking around whether they can integrate.

fixed the breakout and added that link!! I somehow read that GitLab support wasn't available yet, but obviously, I wasn't looking in the right place! thanks!

_posts/2024-12-13-python-packaging-security.md

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

lwasser · 2024-12-19T02:55:34Z

NOTES: i'm doing a big rewrite (more of a reorg) of this content. I'll start with the high-level concepts of securing the environment / GitHub and GitHub-PyPI, and then the other things like sanitizing branch names and such can come next.
This gives users 2 key takeaways (plus adding Zizmor as a third!) to start with now rather than 10 things, which is a lot! so I'd avoid more reviews until I push these edits given the structure will change (to hopefully be more clear and more accurate!) 🚀

i'm super appreciative of all of the feedback.

lwasser · 2024-12-20T01:40:32Z

Ok i've made significant changes to the post and added a new figure that tries to show trusted publisher as a concept with the 2 step ci action part.

i've also added links throughout to pypa / pypi documentation. AND I've tested out gizmo!! it's great. i haven't tried as a pre-commit yet but plan to.

Also @webknjaz that Bluesky pro tip was 🔥 !! I set that up quickly, and we now have a pyopensci.org domain. I added a nod to you at the top of the post for providing so much input!!

_posts/2024-12-13-python-packaging-security.md

lwasser · 2024-12-20T02:00:42Z

_posts/2024-12-13-python-packaging-security.md

+
+## Don’t cache package dependencies in your publish step
+
+Caching dependencies involves storing dependencies to be reused in future workflow runs. This approach saves time, as GitHub doesn’t need to redownload all dependencies each time the workflow runs.


This is where I'm confused about pinning deps, as that doesn't seem like a good idea for users when I think about environments and dependency conflict nightmares.

For deploy workflows or anything creating release artifacts end users might use, I think I've heard from @sethmlarson that it makes sense to reduce the number of dependencies you're using where possible: you can't have problems with a hacked dependency you're not using.

That should also reduce the conflict nightmares when pinning things...

Thanks Hugo! This helps clarify what I was struggling with! I'll rewrite this sentence to discuss the concept more broadly! I see that seth also gave this a thumbs up!

lwasser · 2024-12-20T02:01:17Z

_posts/2024-12-13-python-packaging-security.md

+
+Using a release-based workflow ensures that your publishing step is tightly controlled. A pull request will never accidentally trigger a publish build. This reduces your risk!
+
+


Add: a few sentences on

delete old tokens and secrets containing pypi tokens... this is easy for people to do and good for them to know about.

sethmlarson

Awesome blog post, thanks Leah! I left some comments:

sethmlarson · 2024-12-20T22:48:37Z

_posts/2024-12-13-python-packaging-security.md

+
+Now that you have a GitHub environment setup, you can set up Trusted Publisher in your PyPI account.
+
+A Trusted Publisher setup creates a short-lived secure link between PyPI and your repository.


Maybe remove "short-lived" from this sentence? The underlying implementation of Trusted Publishers uses short-lived credentials, but the link itself is durable!

sethmlarson · 2024-12-20T22:49:19Z

_posts/2024-12-13-python-packaging-security.md

+
+Named as a playful nod to Dr. Zizmor’s famous “clear skin” ads, Zizmor aims to give you “beautiful clean workflows.”
+
+Learn more about Zizmor on the [official blog post by Yossarian](https://blog.yossarian.net/2024/10/27/Now-you-can-have-beautiful-clean-workflows).


Yossarian -> William Woodruff

_posts/2024-12-13-python-packaging-security.md

lwasser · 2024-12-20T22:58:28Z

@sethmlarson thank you so much for the review!! I appreciate it! 🚀
I'll address your comments next!

lwasser · 2024-12-20T22:58:51Z

@all-contributors please add @sethmlarson for review

hugovk

Looking good, mostly nitpicks.

And good to see @sethmlarson here, I was going to summon him!

hugovk · 2024-12-21T20:30:55Z

_posts/2024-12-13-python-packaging-security.md

+
+## Is your PyPI publication workflow secure?
+
+The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.


See googlecolab/colabtools#4979.

It was unwitting end users who installed the hacked package on their Colab accounts, and the Google automation then suspended their accounts:

Connection failed
This account has been blocked from accessing Colab runtimes due to suspected abusive activity. This does not impact access to other Google products. If you believe this has been done in error, review the usage limitations and file an appeal.

Google was pretty quick in restoring accounts, but it happened to a lot of accounts: googlecolab/colabtools#4979 (comment)

hugovk · 2024-12-21T20:34:57Z

_posts/2024-12-13-python-packaging-security.md

+
+## 4. Use Trusted Publisher for PyPI
+
+If you only [publish locally to PyPI using the command line](https://www.pyopensci.org/python-package-guide/tutorials/publish-pypi.html), you need to use a PyPI token. However, if you’re using GitHub Actions to automate your publishing process, setting up **Trusted Publisher** is a more secure option.


TP is more secure in that the tokens are short-lived compared to perpetual API tokens.

And it seems like the hackers got hold of forgotten API tokens in the repo to be able to make the second pair of releases in this case.

_posts/2024-12-13-python-packaging-security.md

hugovk · 2024-12-21T21:05:45Z

_posts/2024-12-13-python-packaging-security.md

+
+## Don’t cache package dependencies in your publish step
+
+Caching dependencies involves storing dependencies to be reused in future workflow runs. This approach saves time, as GitHub doesn’t need to redownload all dependencies each time the workflow runs.


For deploy workflows or anything creating release artifacts end users might use, I think I've heard from @sethmlarson that it makes sense to reduce the number of dependencies you're using where possible: you can't have problems with a hacked dependency you're not using.

That should also reduce the conflict nightmares when pinning things...

willingc · 2024-12-21T21:12:07Z

I love seeing comments by @sethmlarson, @hugovk and @webknjaz. Thank you for being part of the pyOpenSci community and sharing your expertise. 🐍 🐍

Co-authored-by: Hugo van Kemenade <[email protected]>

_posts/2024-12-13-python-packaging-security.md

Co-authored-by: Hugo van Kemenade <[email protected]>

_posts/2024-12-13-python-packaging-security.md

lwasser · 2025-01-08T18:40:21Z

@all-contributors please add @hugovk for code, review

allcontributors · 2025-01-08T18:40:31Z

@lwasser

I've put up a pull request to add @hugovk! 🎉

feat: draft of lessons learned on publish security

b236a94

lwasser commented Dec 14, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser added 3 commits December 13, 2024 19:00

Update _posts/2024-12-13-python-packaging-security.md

fb6318f

fix: a few more typos

d85f1a2

fix: more typos and redundancies

572e705

lwasser commented Dec 17, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser commented Dec 17, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

Apply suggestions from code review

6a1fd45

lwasser marked this pull request as ready for review December 17, 2024 18:51

webknjaz reviewed Dec 18, 2024

View reviewed changes

lwasser and others added 5 commits December 18, 2024 18:15

fix: edits from @webknjaz

029bd04

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from @webknjaz

e16facb

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from @webknjaz

4a80188

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from review

7cf3279

fix: edits from review

1bd173c

lwasser added 2 commits December 18, 2024 20:51

fix: more edits from review

a285a13

fix: more refinement

b9b8d3e

lwasser changed the title ~~[DRAFT] feat: draft of lessons learned on publish security~~ feat: draft of lessons learned on publish security Dec 19, 2024

lwasser changed the title ~~feat: draft of lessons learned on publish security~~ feat(blog): blog post--lessons learned on publish security Dec 19, 2024

fix: more edits and add graphics

d6630bc

lwasser commented Dec 20, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser commented Dec 20, 2024

View reviewed changes

lwasser added 2 commits December 20, 2024 14:24

Fix: more images and removing todos

3215e26

add: link to packaging guide

21bb0cf

sethmlarson reviewed Dec 20, 2024

View reviewed changes

allcontributors bot mentioned this pull request Dec 20, 2024

docs: add sethmlarson as a contributor for review #541

Merged

hugovk reviewed Dec 21, 2024

View reviewed changes

lwasser changed the title ~~feat(blog): blog post--lessons learned on publish security~~ [jan 21 merge] feat(blog): blog post--lessons learned on publish security Jan 6, 2025

fix: edits from @hugovk

fac9401

Co-authored-by: Hugo van Kemenade <[email protected]>

lwasser commented Jan 8, 2025

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser and others added 3 commits January 8, 2025 11:23

fix: edits from @hugovk

618a3b0

Co-authored-by: Hugo van Kemenade <[email protected]>

Fix: Edit from @sethmlarson

595392b

fix: edits from @hugovk

e98531b

Co-authored-by: Hugo van Kemenade <[email protected]>

lwasser commented Jan 8, 2025

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

Fix: Edit from @sethmlarson

b4d1860

allcontributors bot mentioned this pull request Jan 8, 2025

docs: add hugovk as a contributor for code, and review #548

Merged

This was referenced Jan 8, 2025

request: release workflow on GitHub and follow PyPA and PyPI security recommendations pyOpenSci/pyos-package-template#48

Open

doc: update screenshots for trusted publisher (and dev docs sync issue) pypi/warehouse#17374

Open


		## Is your PyPI publication workflow secure?

		The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.


		You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach.

		Note: Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned!


		## Don’t cache package dependencies in your publish step

		Caching dependencies involves storing dependencies to be reused in future workflow runs. This approach saves time, as GitHub doesn’t need to redownload all dependencies each time the workflow runs.


		Using a release-based workflow ensures that your publishing step is tightly controlled. A pull request will never accidentally trigger a publish build. This reduces your risk!


		Now that you have a GitHub environment setup, you can set up Trusted Publisher in your PyPI account.

		A Trusted Publisher setup creates a short-lived secure link between PyPI and your repository.


		Named as a playful nod to Dr. Zizmor’s famous “clear skin” ads, Zizmor aims to give you “beautiful clean workflows.”

		Learn more about Zizmor on the [official blog post by Yossarian](https://blog.yossarian.net/2024/10/27/Now-you-can-have-beautiful-clean-workflows).


		## 4. Use Trusted Publisher for PyPI

		If you only [publish locally to PyPI using the command line](https://www.pyopensci.org/python-package-guide/tutorials/publish-pypi.html), you need to use a PyPI token. However, if you’re using GitHub Actions to automate your publishing process, setting up Trusted Publisher is a more secure option.

[jan 21 merge] feat(blog): blog post--lessons learned on publish security #538

Are you sure you want to change the base?

[jan 21 merge] feat(blog): blog post--lessons learned on publish security #538

Conversation

lwasser commented Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwasser commented Dec 17, 2024

webknjaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwasser commented Dec 19, 2024

lwasser commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwasser commented Dec 20, 2024

lwasser commented Dec 20, 2024

hugovk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willingc commented Dec 21, 2024

lwasser commented Jan 8, 2025

allcontributors bot commented Jan 8, 2025

lwasser commented Dec 14, 2024 •

edited

Loading