Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to preserve file attributes #106

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

NodyHub
Copy link
Collaborator

@NodyHub NodyHub commented Dec 9, 2024

Invent option to preserver fle attributes, so that go-extract can be adopted in other extraction libraries that offers this functionality.

This PR invents:

  • Preserve file attributes, access-time, mod-time, file permissions
  • Preserve file ownership for tar archives
  • Added symlink support for 7zip (added by accident while writing test cases)
  • Invented additional security check before file creation (added for the sake of completeness, even if in the currently feature set not necessary)
  • Increased the max files default value. 1k is a bit annoying on the cli.

@NodyHub NodyHub linked an issue Dec 9, 2024 that may be closed by this pull request
added aditional security check
adjusted architecture specifics
adjusted system specific implementation
create system specific path
@NodyHub NodyHub force-pushed the nodyhub/invent-option-preserver-attributes branch from 4eb99d0 to 0405d3f Compare December 9, 2024 11:35
@NodyHub NodyHub marked this pull request as ready for review December 9, 2024 13:52
@NodyHub NodyHub requested a review from a team as a code owner December 9, 2024 13:52
alanpchua
alanpchua previously approved these changes Dec 9, 2024
Copy link

@alanpchua alanpchua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the accidental feature support because of test. 😆

@NodyHub NodyHub changed the title Invent option preserver attributes Add option to preserve file attributes Dec 9, 2024
@NodyHub NodyHub requested a review from samsalisbury December 9, 2024 14:50
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jan, I've not completed the whole review yet, but have some intermediate questions/comments so submitting this now and will continue the review later/tomorrow.

target_disk_darwin.go Outdated Show resolved Hide resolved
target_disk_darwin.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jan, this is looking great, a few comments/suggestions below, the only blocker I think is the mtime/atime test cases comment below. Would be happy to pair with you on a call to talk in more detail if that helps, thanks.

target_memory_test.go Outdated Show resolved Hide resolved
unpack_test.go Outdated Show resolved Hide resolved
unpack_test.go Outdated Show resolved Hide resolved
extractor.go Outdated Show resolved Hide resolved
extractor.go Outdated Show resolved Hide resolved
target_disk_unix.go Outdated Show resolved Hide resolved
target_disk_others.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :shipit:

Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I missed something here so I will request changes to dismiss the approval for now!

@NodyHub NodyHub requested a review from samsalisbury December 17, 2024 11:02
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jan, a few more notes below, I still haven't reviewed the tests in detail, but I will try to get that done tomorrow!

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
config.go Outdated Show resolved Hide resolved
target.go Show resolved Hide resolved
if os.Geteuid() != 0 {
return &fs.PathError{Op: "chown", Path: name, Err: os.ErrPermission}
if err := os.Lchown(name, uid, gid); err != nil {
return &fs.PathError{Op: "chown", Path: name, Err: err}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return the error? Maybe...

Suggested change
return &fs.PathError{Op: "chown", Path: name, Err: err}
return fmt.Errorf("chown failed: %w", err)

Also it would be good to have a test case for this error to check we don't silently fail when we shouldn't. You could force the error by using a nonexistent path maybe?

Copy link
Collaborator Author

@NodyHub NodyHub Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm ... I thought it might be nice to wrap the error to have more structure/context, e.g., path/op, but happy to just return the error "straighter".

I will add an explicit test case to verify the error edge-case.

Edit:
The code path is already tested in TestUnpackWithPreserveOwnershipAsNonRoot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, when I read this diff the first time, it looked like it wasn't returning any error at all, now when I see it it seems as though the error was being returned. Sorry for the confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code path is already tested in TestUnpackWithPreserveOwnershipAsNonRoot.

Looking at those test cases, I can't see one where we actively expect an error, so I don't think the coverage is explicit enough to ensure we don't break this behaviour in the future. I will submit a patch to add explicit coverage, you can decide how to use it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, that sounds great. Looking forward to your patch! let me know when we can sync on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jan, still catching up after the break, I can't remember if I created this patch or not, I think I stopped because I realised there were other issues with tests that should be addressed as well. I am going to re-familiarise myself with this PR today, and will let you know once I'm back up to speed!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Sam!

@NodyHub NodyHub requested a review from samsalisbury December 18, 2024 09:55
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jan, thanks for responding to the comments/questions so far.

One thing that occurs to me is that if the extractor currently drops all file attributes, then there is a possibility that this is a breaking change for some users, although I think this is unlikely, so I don't think we need to be very concerned. I wonder if it would be a good time to start a changelog to document updates for users to make sure they aren't surprised by the new behaviour?

I think it would be good to spend a little more time on this before merging, to make sure the tests are covering everything we need, and possibly to refactor them a little as well to separate out tests for errors vs happy paths, and remove some of the branching logic, as I think that would make them more readable. Does that sound ok to you? This is probably safe to release as-is, but my concern is that I don't feel confident in the tests yet.

unpack_other_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a marker review for the latest changes, thanks Jan, still working through it as a whole.

Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Jan, I left some more comments around testing. I hope these are helpful, I will be available on Slack if you want to follow up quickly with any of these. I think once these are addressed we'll be good to go!

unpack_test.go Outdated
Comment on lines 1614 to 1616
uid int
gid int
}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the contents[] entries contain UID/GID info, what is the meaning of them at the level of the whole test case? Will it always be the case that all files in an archive will have the same UID/GID pair? My instinct is to either specify this separately for each file, or just specify it for all files at the level of the test case, either of these will make the tests easier to understand I think. If there is a reason to have these at test case level as well then I think this should be documented to avoid future confusion, thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, yes, you are right. The definition on that level can be dropped. I will work on something that will hopefully reduce the confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6c8d47a

Comment on lines 69 to 86
// compare archive uid/gid with current user
archiveEntriesEqualCurrentOwner := tc.doesNotSupportOwner

// If the archive supports owner information, check if the current user is the owner of the test data
if !tc.doesNotSupportOwner && (os.Getuid() != tc.uid || os.Getgid() != tc.gid) {
archiveEntriesEqualCurrentOwner = false
}

// Unpack should fail if the user is not root and the uid/gid
// in the archive is different from the current user (only
// if the archive supports owner information)
err := extract.Unpack(ctx, dst, src, cfg)

// chown will only fail if the user is not root and the uid/gid is different
if archiveEntriesEqualCurrentOwner && err != nil {
t.Fatalf("error unpacking archive: %v", err)
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all entries in an archive always have the same uid/gid pair? This seems unlikely to be the case in general.

For more clarity, it would be great if we could set this test up so that the UID/GID pair was a non-match for the current user (as one test case), and then as a match for the current user (as a separate test case). This would remove the logic and ensure that each test is testing the same thing each time it is run. Right now, this test is testing two different things depending on local conditions.

It might be easier at first to set this up optimisitically by picking an "unlikely" uid/gid pair for the case that they are different, and fail the test early on if they do accidentally match the current process. And it may not be necessary to explicitly test the case where the uid/gid does match, what we really want is to trigger the error when unpack fails due to the failed chown, so maybe just that one test case would be enough.

This is related to #106 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%, I understand what you mean. I am going to work on an adjustment of the tests that adresses your thoughts and removing complexity of the test cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the test function and introduced a helper function (containsDifferentOwnerThenCurrentUser) so that we can check if the test is expected to fail or not.

Changes are made in edfd38f

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm ... I dropped the test case with the current user id and added another one with an invalid uid/gid. Point is that Unix supports 32bit uid/gid since kernel version 2.4 (ref) so, -1 would not be a really invalid value for chown.

However, the helper function from ^ is dropped again and the other changes are made in 064184a

Copy link
Collaborator Author

@NodyHub NodyHub Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, rolling back on the -1 test-case, -1 is an ivalide value or more a value thats ignored by chown according to https://github.com/hashicorp/go-extract/actions/runs/12672763343/job/35317521527?pr=106#step:5:10. The sad point is that the call of target.Chown does not fail, but the uid/gid has not been adjusted :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I adressed that point in 45cb851 and 5c74796. Should be good to go.

})
}
}
func TestUnpackWithPreserveOwnershipAsRoot(t *testing.T) {
Copy link
Contributor

@samsalisbury samsalisbury Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the comment above, we probably want to make sure that the uid/gid pair of at least one arhive entry in this test case does not match the current process, or it would pass trivially in that case. You can probably add a single new test case to the slice of cases with some "unusual" uid/gid pair...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adressed this comment above in a certain way. uid/gid can be 32bit (#106 (comment)) and so -1 is unexpected, but still a valid 32bit integer. Not sure how we can create a more "unusual" ownership.

However, I think I addressed this request in 064184a and 6c8d47a

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, rolling back on ^ uid=-1 is not applied by chown and fails. However, figured out in #106 (comment) a way to test invalid uid/gid.

unpack_unix_test.go Outdated Show resolved Hide resolved
@NodyHub NodyHub requested a review from samsalisbury January 8, 2025 15:27
Copy link
Contributor

@samsalisbury samsalisbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jan, thanks, this looks better with the test case level uid/gid removed, the error checking logic doesn't quite look right atm in the tests. I wonder if we can work more closely together tomorrow morning to get this done, I don't think it is far away from being ready at all!

Comment on lines 83 to 85
if err == nil {
t.Fatalf("error unpacking archive: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if err != nil?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment above states:

// chown will only fail if the user is not root and the
// uid/gid in the archive is different from the current user

The call of disk.Chown() as a non-root user will fail, bc/ of missing permissions. Only root is capable of changing the ownership. Due to the fact that the syscall for chown is performed as non-root, an error is expected. If that error is not thrown, that would be unexpected.

The test will be skipped for the user root

func TestUnpackWithPreserveOwnershipAsNonRoot(t *testing.T) {
if os.Getuid() == 0 {
t.Skip("test requires non-root privileges")
}

Comment on lines 76 to 77
if !tc.doesNotSupportOwner && err == nil {
t.Fatalf("error unpacking archive: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be != nil ? As the log message references the error...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is sadly made to an out-of-date state of this PR and already adjusted. The ask for the err == nil is addressed in #106 (comment).

The check for tc.doesNotSupportOwner is moved to the top of the test-case:

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// skip test if the archive does not store ownership information
if tc.doesNotSupportOwner {
t.Skipf("archive %s does not store ownership information", tc.name)
}

Background: If an archive format does not contain ownership details, we can skip the archive format, bc/ the test for failing chown will never fail, bc/ extracted files will always be the uid/gid of the process and the archive entries will for, e.g. zip (which does not contain file ownership information), will return always the current uid/gid (ref)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Preserve file attributes
3 participants