Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In 3.7.2, spaces before <em> or <i> tags get removed after conversion to KEPUB #176

Open
6 of 11 tasks
hengyu95 opened this issue Dec 13, 2024 · 9 comments
Open
6 of 11 tasks

Comments

@hengyu95
Copy link

hengyu95 commented Dec 13, 2024

Bug Checklist

These items are mandatory. If you need help finding this information submit the
bug report with as much completed as you can and ask for help finding the rest.

  • I am using the latest version of calibre to report this bug, which is:
  • I am using an official calibre release, not one from a third party (e.g.
    your Linux distro, Flatpak, Chocolatey package, Homebrew, etc.)
  • I am using the latest version of this plugin, which is:
  • My operating system is Windows 11
  • I have included the full, complete, unmodified debug log from calibre
    • Directions for getting the debug log are under the "Logs" header below.
  • I have translated the text in any screenshots and logs to English, or all
    screenshots and logs included are in English.

These items are optional. Fill in as much of them as possible. If something is
not applicable to your bug report, note that.

  • I have installed the Scramble Epub plugin (see
    https://www.mobileread.com/forums/showthread.php?t=267998) and will attach
    a scrambled copy of the book I'm having problems with (attach a file by
    dragging and dropping onto the Github editor).
    • If this is a conversion bug, I will also attach a scrambled copy of
      the converted book.
  • The path to my calibre library or to a book in my calibre library has
    non-ASCII characters: yes/no
  • If I am using Windows 10, I (have/have not) enabled Windows' beta support
    for Unicode (see
    https://www.mobileread.com/forums/showpost.php?p=3988195&postcount=2052)
  • If I am using Windows 10, does this bug happens with beta Unicode support
    both enabled and disabled, only when enabled, or only when disabled?

Describe the bug

In the EPUB to KEPUB conversion, spaces before italics all get removed.

Steps to Reproduce

Convert any EPUB to KEPUB via sending it to Kobo

Expected behavior

Spaces before italics remain

Actual behaviour

Spaces before italics are deleted. This is corroborated by many other users on the mobileread forums.

Screenshots

Original EPUB
image

Converted KEPUB
image

Logs

N/A, conversion itself "works"

Additional context

Add any other information you think might be helpful.

@hengyu95 hengyu95 changed the title In 3.7.2, spaces before <em> or <i> tags get removed after conversion. In 3.7.2, spaces before <em> or <i> tags get removed after conversion to KEPUB Dec 13, 2024
@chocolatechipcats
Copy link

chocolatechipcats commented Dec 14, 2024

Also appears to affect some other tags: https://www.mobileread.com/forums/showpost.php?p=4474070&postcount=3159

@hungrytoast7
Copy link

hungrytoast7 commented Dec 14, 2024

The space between the chapter number and the chapter title is also missing after converting the file to Kepub. "Chapter 3: The Sun is Rising" appears as "Chapter 3:The Sun is Rising" in both the table of contents and the header. This issue may be related to the < a id="page-42"/ > tag.

@gabri25ele
Copy link

gabri25ele commented Dec 16, 2024

I confirm the problem with italics, I'm currently back to 3.7.0

@GAntiko
Copy link

GAntiko commented Dec 16, 2024

I confirm the problem with italics, but it happens also with bold. I tested it with 3.7.2, Calibre 7.22 and Kobo Libra Colour

@onecrayon
Copy link

If this is verified to be working in 3.7.0 and failing in 3.7.2, there's actually not a lot of changed lines that could be the issue. Has anyone tested to see if the problem exists in 3.7.1? Because when I look at the diff between 3.7.2 and 3.7.0 there are only two changes that have any likelihood of triggering this difference to my eye; one was committed as part of 3.7.1 and the other as part of 3.7.2:

@jadehawk
Copy link

If this is verified to be working in 3.7.0 and failing in 3.7.2, there's actually not a lot of changed lines that could be the issue. Has anyone tested to see if the problem exists in 3.7.1? Because when I look at the diff between 3.7.2 and 3.7.0 there are only two changes that have any likelihood of triggering this difference to my eye; one was committed as part of 3.7.1 and the other as part of 3.7.2:

I downgraded to 3.7.1 on my KLC and resend the book in currently reading. I do not see the missing space anymore when Italics is involved. This was the most noticeable issue for me with 3.7.2 so I can't comment on nothing else. staying on 3.7.1 unless I see anything else weird in the books I read.

Either way, Thank you for the plugging.

@dragid10
Copy link

The bug was definitely introduced in commit: https://github.com/jgoguen/calibre-kobo-driver/commits/e30f7a4

I did a git bisect and this was the only version with the errored spacing

@Boswell-Scrubbs
Copy link

I have this problem on my Kobo Libra 2 when using 3.7.2. Downgraded to 3.7.1 and the problem went away.

P.S. Thank you for the awesome plugin.

@dragid10
Copy link

dragid10 commented Dec 23, 2024

Looking at it even further, it seems to be caused by the new regex used to fix the highlight breaks between sentences in commit: 8638dc0.

I added this paragraph to the page_github_106.html (This is just a random excerpt from one of the books I have)

<p class="nonindent">
		“Even in Russia, <i class="calibre2">women</i> are”
</p>

and then added a bunch of print statements in container.py under the _append_kobo_spans_from_text method.

The proper text should read:

Even in Russia, women are

but it instead reads:

Even in Russia,women are

It seems that this new regex is causing spaces before indentation tags, to be missed (or its causing space after regular text, but before an indentation tag, to be missed).

Edit: Okay I've re-checked, I think this specific code change is actually what is causing the problem. Ignoring the space between punctuation and indentation perhaps? But I believe that line was added to work with the new regex mentioned above. Unfortunately I struggled for about 4 hours with trying to set up my development environment to even try to test this. I'd love to help submit a fix that isn't just removing that change, but the calibre dev environment setup was a bit frustrating for me 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants