-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Description
I've noticed there's an issue with how client.get_screenshot behaves when URLscan was unable to grab a screenshot.
The /api/v1/result API doesn't seem to know if a screenshot was successfully grabbed or not - task.screenshotURL is always populated with the result URL.
When there is no screenshot available the endpoint returns a HTTP 404 response with a No Screenshot Available image. This breaks the exception handler in https://github.com/urlscan/urlscan-python/blob/main/src/urlscan/client.py#L377 as it always expects a valid JSON response:
Traceback (most recent call last):
File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 375, in _get_error
res.raise_for_status()
File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 88, in raise_for_status
self._res.raise_for_status()
File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://urlscan.io/screenshots/<redacted>.png'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
<redacted>
self.client.download(
File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 356, in download
error = self._get_error(res)
^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/urlscan/client.py", line 377, in _get_error
data: dict = exc.response.json()
^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 832, in json
return jsonlib.loads(self.content, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/json/__init__.py", line 341, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byteA fix I came up with is to is to parametrize _get method with how the exception should be handled and then modify the default "raw" parameter in get_json.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels