Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 8: ordinal not in range(128) #106

Open
Artucuno opened this issue Feb 9, 2023 · 3 comments

Comments

@Artucuno
Copy link

Artucuno commented Feb 9, 2023

Using the example gives me this error

Traceback (most recent call last):
  File "C:\Users\user\Desktop\icc\download.py", line 2, in <module>
    save_website(
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\__init__.py", line 164, in save_website
    crawler.save_complete(pop=open_in_browser)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\core.py", line 218, in save_complete
    self.scheduler.handle_resource(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 156, in handle_resource
    return self._handle_resource(resource)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 191, in _handle_resource
    resource.retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 368, in retrieve
    return self._retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 456, in _retrieve
    context = self.extract_children(self.parse())
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 439, in extract_children
    self.scheduler.handle_resource(ans)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 156, in handle_resource
    return self._handle_resource(resource)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 191, in _handle_resource
    resource.retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 368, in retrieve
    return self._retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 612, in _retrieve
    self.extract_children(self.parse()),
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 591, in extract_children
    source = re.sub(
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 560, in repl
    url, _ = unquote_match(match.group(1).decode(encoding), match.start(1))
@rajatomar788
Copy link
Owner

Please attach - the version you are using, code you are using and whether the error is occurring only on this particular site for you or with all the other sites.

@ckhordiasma
Copy link

I have the same issue, not sure how many sites it doesn't work with, but here is some example code

from pywebcopy import save_webpage
 
kwargs = {'project_name': 'test'}
 
save_webpage(
   
    # url of the website
    url='https://www.wix.com',
     
    # folder where the copy will be saved
    project_folder='./temp',
    **kwargs
)

@Artucuno
Copy link
Author

I got the error while I was trying to download a Wix site too

Forgot that I opened this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants