escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML #1375

thescientist13 · 2025-01-06T22:55:45Z

Type of Change

Bug

Summary

It was observed in ProjectEvergreen/www.greenwoodjs.dev#120 (comment) that if creating markdown content as follows

# Server Rendering

<app-ctc-block variant="snippet" heading="src/pages/users.js">

  ```js
  export async function getBody(compilation, page, request) {
    const timestamp = new Date().getTime();

    return `
      <h1>Hello from the server rendered users page! 👋</h1>
      <table>
        <tr>
          <th>Name</th>
          <th>Image</th>
        </tr>
      </table>
      <h6>Last Updated: ${timestamp}</h6>
    `;
  }
  ```

</app-ctc-block>

While the output from unified is correct and properly escaped

<h1>Server Rendering</h1>
<app-ctc-block variant="snippet" heading="src/pages/users.js">
    <pre><code class="language-js">export async function getBody(compilation, page, request) {
      const timestamp = new Date().getTime();
    
      return `
        &#x3C;h1>Hello from the server rendered users page! 👋&#x3C;/h1>
        &#x3C;table>
          &#x3C;tr>
            &#x3C;th&gt;Name&#x3C;;/th>
            &#x3C;th&gt;Image&#x3C;/th>
          &#x3C;/tr>
        &#x3C;/table>
        &#x3C;h6>Last Updated: ${timestamp}&#x3C;/h6>
      `;
    }
    </code></pre>
</app-ctc-block>

The output from WCC / parse5 has all the HTML entities converted back to HTML

<h1>Server Rendering</h1>
<app-ctc-block variant="snippet" heading="src/pages/users.js">
  <pre><code class="language-js">
    export async function getBody(compilation, page, request) {
      const timestamp = new Date().getTime();
    
      return `
        <h1>Hello from the server rendered users page! 👋</h1>
        <table>
          <tbody><tr>
            <th>Name</th>
            <th>Image</th>
          </tr>
        </tbody></table>
        <h6>Last Updated: ${timestamp}</h6>
      `;
    }
  </code></pre>
</app-ctc-block>

This means that instead of rendering as text

The HTML is rendered literally, breaking the output

Details

The main issue seems to be in WCC (parse5 specifically), in that parse5 will convert HTML entities automatically when accessing the "raw" value of a node

Which means that when building the HTML back out, instead of getting something like </h1>Some text</h1> to do innerHTML work in WCC, we end up getting the literal HTML </h1>Some text</h1> which does seem to be the expected behavior as they state, meaning it will be up to application consumers to manage this preservation, unfortunately.

A seemingly simple solution would be to just manually escape < when parsing innerHTML in WCC, though I'm not sure if this is the best solution, or more likely, the best place?

 } else if (nodeName === '#text') {
    // escape < brackets
    innerHTML += value.replace(/</g, '&lt;');
  }

The challenge is that as far as Greenwood is concerned, the input to WCC is correct, so how would we know where to do the substitution on the way out? Per reading through similar issues in the parse5 repo, we would have to double parse and convert based on locations, and from what i understand, adding location markers is a pretty significant performance overhead.

Here's a simplified reproduction repo - https://github.com/thescientist13/parse5-html-entities
naive implementation PR in WCC - bug/preserve light dom html entities in text nodes wcc#182

The text was updated successfully, but these errors were encountered:

thescientist13 · 2025-01-07T02:32:59Z

Another complication is that parse5 will also seem to encode entities even if they aren't part of HTML, which makes this work around in WCC even more unpredictable :/
ProjectEvergreen/wcc#182

{
  value: '\n' +
    '          <h1>Hello from the server rendered users < page! 👋</h1>\n' +
    '        '
}
{
  html: '\n' +
    '        <x-ctc>\n' +
    '          <h1>Hello from the server rendered users &lt; page! 👋</h1>\n' +
    '        </x-ctc>\n' +
    '        '
}

Wonder if we'll have to do something from the Greenwood side, e.g.
https://github.com/ProjectEvergreen/greenwood/blob/v0.31.0-alpha.2/packages/cli/src/lifecycles/prerender.js#L98

body = await new Promise((resolve, reject) => {
  pool.runTask({
    executeModuleUrl: workerPrerender.executeModuleUrl.href,
    modulePath: null,
    compilation: JSON.stringify(compilation),
    page: JSON.stringify(page),
    prerender: true,
    htmlContents: body.replace(/&#x3C;/g, 'custom-left-bracket')
    scripts: JSON.stringify(scripts)
  }, (err, result) => {
    if (err) {
      return reject(err);
    }

    return resolve(result.html);
  });
});

body = body.replace(/custom-left-bracket/g, '&#x3C;')

thescientist13 added bug Something isn't working CLI SSR labels Jan 6, 2025

thescientist13 added this to the 1.0 milestone Jan 6, 2025

thescientist13 self-assigned this Jan 6, 2025

thescientist13 added this to [Greenwood] Phase 10 - Ecosystem Compat Jan 6, 2025

thescientist13 added question Further information is requested needs upstream labels Jan 6, 2025

thescientist13 moved this to 🔖 Ready in [Greenwood] Phase 10 - Ecosystem Compat Jan 6, 2025

thescientist13 mentioned this issue Jan 6, 2025

feature/issue 96 enhanced copy to clipboard with PNPM docs ProjectEvergreen/www.greenwoodjs.dev#120

Draft

20 tasks

thescientist13 changed the title ~~escaped HTML entities from markdown content are not being honored when prerendering~~ escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML Jan 7, 2025

thescientist13 mentioned this issue Jan 7, 2025

bug/preserve light dom html entities in text nodes ProjectEvergreen/wcc#182

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML #1375

escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML #1375

thescientist13 commented Jan 6, 2025 •

edited

Loading

thescientist13 commented Jan 7, 2025 •

edited

Loading

escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML #1375

escaped HTML entities in markdown content are not being honored when prerendering Light DOM HTML #1375

Comments

thescientist13 commented Jan 6, 2025 • edited Loading

Type of Change

Summary

Details

thescientist13 commented Jan 7, 2025 • edited Loading

thescientist13 commented Jan 6, 2025 •

edited

Loading

thescientist13 commented Jan 7, 2025 •

edited

Loading