Skip to content

Client get_screenshot error handling doesn't work for missing screenshots #62

@nazywam

Description

@nazywam

I've noticed there's an issue with how client.get_screenshot behaves when URLscan was unable to grab a screenshot.

The /api/v1/result API doesn't seem to know if a screenshot was successfully grabbed or not - task.screenshotURL is always populated with the result URL.

When there is no screenshot available the endpoint returns a HTTP 404 response with a No Screenshot Available image. This breaks the exception handler in https://github.com/urlscan/urlscan-python/blob/main/src/urlscan/client.py#L377 as it always expects a valid JSON response:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 375, in _get_error
    res.raise_for_status()
  File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 88, in raise_for_status
    self._res.raise_for_status()
  File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://urlscan.io/screenshots/<redacted>.png'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  <redacted>
    self.client.download(
  File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 356, in download
    error = self._get_error(res)
            ^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 377, in _get_error
    data: dict = exc.response.json()
                 ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 832, in json
    return jsonlib.loads(self.content, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/json/__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

A fix I came up with is to is to parametrize _get method with how the exception should be handled and then modify the default "raw" parameter in get_json.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions