Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal resolution for assets in get_url function #2726

Open
wants to merge 11 commits into
base: next
Choose a base branch
from

Conversation

ZzMzaw
Copy link

@ZzMzaw ZzMzaw commented Dec 6, 2024

This PR solves issue #2598 through internal resolution for assets (i.e. using @ in front of the asset path).

It allow to retrieve an assets permalink based on its associated page and according to the language.
Documentation regarding the galery shortcode has been updated accordingly.

Beware supports for resolving assets internally has only been implemented for the get_url function.
I am not sure if it should be generalized to links as well (as get_url can and should be used to feed the href of a link).

This PR could maybe provide an answer to the issue described by the PR #2146 (a not to-be-merged PR for demo purpose only) even if it doesn't fix it where the PR points out (not sure about this at all).

Sanity check:

  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Code changes

(Delete or ignore this section for documentation changes)

  • Are you doing the PR on the next branch?

If the change is a new feature or adding to/changing an existing one:

  • Have you created/updated the relevant documentation page(s)?

Colocated path is required for computing serialized assets and assets permalinks.
Test have been added to check colocated path format as expected by those computations.
@ZzMzaw
Copy link
Author

ZzMzaw commented Dec 7, 2024

There is something wrong with serialization of assets in sub-directories on windows.
I'll look into it but as I don't have windows to reproduce the issue locally, I'll use this PR for testing.
I may force-push several time until I'm able to fix the issue.
I'll keep you posted once it is done.

@ZzMzaw ZzMzaw force-pushed the feat/resolve-internal-assets branch from 2115e03 to 3fb51eb Compare December 7, 2024 05:13
Fix assets serialization behavior which was relying on the output path instead of the source path like for pages.
This behavior was making assets permalinks computation fail.
Compute assets permalinks for serialized assets based on associated page or section permalink.
Assets are duplicated for every language so permalinks should be releated to it in order to be language-aware.
Assets permalinks allows to resolve internally colocated assets as well as pages and sections.
Galery can be fixed by using internal resolution.
Adding the lang as well allows to make the target URL lang-aware even if the image is just duplicated.

Example file has been fixed as well even if it might not be used.
@ZzMzaw ZzMzaw force-pushed the feat/resolve-internal-assets branch from 3fb51eb to f74f551 Compare December 7, 2024 05:31
@ZzMzaw
Copy link
Author

ZzMzaw commented Dec 7, 2024

Tests are successful now, including on Windows.
It would be great if someone could test the galery shortcode on Windows.

Anyway, PR can be reviewed now.

assert!(file.colocated_path.is_some());
if let Some(colocated_path) = file.colocated_path {
assert!(!colocated_path.starts_with('/'));
assert!(colocated_path.ends_with('/'));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert what the actual path it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just asserted what is really important for a colocated path which is not starting with a '/' and ending with a '/'.
I wanted to emphasize that because if you change the way you compute colocated path in the future but don't respect those two assertions, you'll break the current feature.

I definitely can split the check (one for the page and one for the section as they have different colocated path) in this test while adding a comment on what is really important to keep through time.
Would it be ok for you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be fine

.map(|asset_relative_path_as_string| {
format!(
"/{}{}",
colocated_path.expect("Should have colocated path for assets"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it should have it, just pass &String for colocated_path instead of the Option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If assets is empty, I don't have any guarantee that colocated_path will be Some.
I tried to reuse the same structure as the existing code but I can add an if !assets.is_empty() to unwrap this variable once instead of doing it in the iteration.
Doing it outside of serialize_assets to get rid of the Option in the signature would require to wrap the call of this function in the if statement mentionned above.
This would result in code duplication as I would have to do it twice, one for the pages and one for the sections.

What would you prefer me to implement?

  1. Keep it as it is
  2. Add the if statement in serialize_assets function to avoid unwrapping the option in the iteration
  3. Add the if statement around the serialize_assets

) -> Vec<String> {
assets
.iter()
.filter_map(|asset| asset.strip_prefix(parent_path.unwrap()).ok())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass parent_path: &Path rather than the option if all we're doing is unwrapping it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If assets is empty, I don't have any guarantee that parent_path will be Some.
I tried to reuse the same structure as the existing code but I can add an if !assets.is_empty() to unwrap this variable once instead of doing it in the iteration.
Doing it outside of serialize_assets to get rid of the Option in the signature would require to wrap the call of this function in the if statement mentionned above.
This would result in code duplication as I would have to do it twice, one for the pages and one for the sections.

What would you prefer me to implement?

  1. Keep it as it is
  2. Add the if statement in serialize_assets function to avoid unwrapping the option in the iteration
  3. Add the if statement around the serialize_assets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much duplication are we talking about? I'm leaning towards 3

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just talking about a few lines (if !assets.is_empty() statement and of parent_path and colocated_path options unwrapping) duplicated twice, for pages and sections (where serialize_assets is currently used).

If we go for 3., we should apply the same logic to the get_assets_permalinks function by moving l.95-96 (if !serialized_assets.is_empty() statement and colocated_path option unwraping) around the function call.

Is it ok if I proceed that way?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. Eventually page/sections will converge probably so it's going to reduce the duplication

components/content/src/utils.rs Outdated Show resolved Hide resolved
/// We need that if there are relative links in the content that need to be resolved
pub permalinks: HashMap<String, String>,
/// A map of all assets (non .md files colocated to sections and pages) and their permalink, distributed per lang
/// We need that if there are relative links in the content that need to be resolved
pub assets_permalinks: HashMap<String, HashMap<String, String>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for separating by lang? Otherwise we could just add them to the permalinks maps
Are the assets copied for each language? I can't remember

Copy link
Author

@ZzMzaw ZzMzaw Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assets are copied for each language.

This will allow to have different permalink depending on the lang for the same asset:

  • For the default lang, path/to/the/asset.jpg would become https://yourwebsite.com/slug/to/the/asset.jpg
  • For the non-default lang lang, path/to/the/asset.jpg would become https://yourwebsite.com/lang/slug/to/the/asset.jpg

I tested several things such as always defaulting to the default lang but I find this solution to be more clean and future proof.
When a lang is selected, your assets point to the same base URL than your page.
In addition, even if today it is not the case, if we introduce per-lang assets, it will just require minor adaptations.

Please let me know if you prefer to have something more simple.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assets are copied for each language.

Ah that's not ideal from content duplication/processing I guess. We can't even define a language specific asset and I don't think it's something that will be added.

I think it makes more sense for assets to be shared across languages and be located once in the folder of the main language page

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's not ideal from content duplication/processing I guess.

This is mandatory and may remain mandatory in the future from my understanding.
We'll break any relative links in non-default language pages otherwise.
Let's imagine we have a test.file colocated file and a link defined as follow: <a href="test.file">, it would work in default language (resolved as https://yourwebsite.com/slug/to/the/test.file by the browser) but would break in non-default ones in case content was not duplicated (resolved as https://yourwebsite.com/lang/slug/to/the/test.file by the browser).

I think it makes more sense for assets to be shared across languages and be located once in the folder of the main language page

If it is still the case, I'll revert my changes to ignore the lang for assets in get_url function.
Is it ok for you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep like you have right now. In an ideal world, we would serve the assets from a single place and rewrite the links to assets to point to the right thing but that's not for now.

As discussed during PR review, it is better to check for the entire path than just the required assumptions.
Assumptions have been written as a comment.
As discussed during PR review, move option unwrapping logic outside of each function
@ZzMzaw
Copy link
Author

ZzMzaw commented Jan 3, 2025

Everything may have been fixed as per our discussion.
I let you check and resolve comments if it fits your expectations.
Feel free to let me know if there is still something to adapt.
Many thanks for your review.

@@ -195,7 +197,26 @@ impl Page {
if page.file.name == "index" {
let parent_dir = path.parent().unwrap();
page.assets = find_related_assets(parent_dir, config, true);
page.serialized_assets = page.serialize_assets(base_path);
if !page.assets.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need that if right? It will be an empty vec if page.assets is empty. Same for the one below

Copy link
Author

@ZzMzaw ZzMzaw Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to fail if colocated_path is none while page.assets is not empty because it should not happen (and this is covered by the expect) but no having a colocated_path is totally valid if there are no assets.
(similar reason for serialized_assets below).
In case we have no assets, serialized_assets and assets_permalinks should remain with default values.

I agree it would make more sense to regroup serialize_assets and get_assets_permalinks in the same block and retrieve colocated_path only once.
Would it be better if code was reworked as follow?

if !page.assets.is_empty() {
    let colocated_path = page.file.colocated_path.as_ref().expect("Should have a colocated path if we have assets");
    page.serialized_assets = serialize_assets(
        &page.assets,
        page.file.path.parent().unwrap(),
        colocated_path
    );
    page.assets_permalinks = get_assets_permalinks(
        &page.serialized_assets,
        &page.permalink,
        colocated_path
    );
}

I may have missed something so please don't hesitate to point it out.
Thanks for your feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an error that can happen it should be an error and not a panic. What you propose there is better yes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never reach the expect within the if so keeping the panic might be ok.
Anyway, user won't be able to circumvent it so it will have to be fixed in zola itself (and it means we may have done some incompatible changes).
All changes have been pushed.

) -> HashMap<String, String> {
let mut permalinks = HashMap::new();
if !serialized_assets.is_empty() {
for asset in serialized_assets {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make it into an iterator, you don't need that check

ZzMzaw added 2 commits January 5, 2025 07:43
As per PR discussion, align get_asset_permalinks with serialize_assets.
As per PR discussion, it makes more sense to have both asset serialization and permalinks generation in the same logical unit.
Change has been made for both page and section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants