502 on upload

Hi,

When I upload into Girder a lot (>1000) of small files from a Web browser (e.g. chrome), I eventually get 502 Bad Gateway errors. Here are the logs on the server:

root@ofsep:~/Projects/logs# tail -n0 -f error.log
[2019-11-14 15:18:03,374] ERROR: 500 Error
Traceback (most recent call last):
  File "/root/Projects/girder/girder/api/rest.py", line 630, in endpointDecorator
    val = fun(self, path, params)
  File "/root/Projects/girder/girder/api/rest.py", line 1205, in POST
    return self.handleRoute(method, path, params)
  File "/root/Projects/girder/girder/api/rest.py", line 947, in handleRoute
    val = handler(**kwargs)
  File "/root/Projects/girder/girder/api/describe.py", line 679, in wrapped
    return fun(*args, **kwargs)
  File "/root/Projects/girder/girder/api/v1/file.py", line 236, in readChunk
    return UploadModel().handleChunk(upload, chunk, filter=True, user=user)
  File "/root/Projects/girder/girder/models/upload.py", line 137, in handleChunk
    upload = adapter.uploadChunk(upload, chunk)
  File "/root/Projects/girder/girder/utility/filesystem_assetstore_adapter.py", line 167, in uploadChunk
    data = chunk.read(BUF_SIZE)
  File "/root/Projects/girder/girder/utility/__init__.py", line 144, in read
    return self.stream.read(*args, **kwargs)
  File "/root/Projects/env/lib/python3.5/site-packages/cherrypy/_cpreqbody.py", line 480, in read
    return self.fp.read(size, fp_out)
  File "/root/Projects/env/lib/python3.5/site-packages/cherrypy/_cpreqbody.py", line 817, in read
    data = self.fp.read(chunksize)
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 381, in read
    data = self.rfile.read(size)
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/makefile.py", line 410, in read
    val = super().read(*args, **kwargs)
  File "/usr/lib/python3.5/_pyio.py", line 1000, in read
    return self._read_unlocked(size)
  File "/usr/lib/python3.5/_pyio.py", line 1040, in _read_unlocked
    chunk = self.raw.read(wanted)
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out
Additional info:
  Request URL: POST http://ofsep.kitware.fr/api/v1/viewer/upload/chunk
  Query string: offset=0&uploadId=5dcdb650e4970173fb36100b&token=eyJ...iTASwA
  Remote IP: 90.63.248.229
  Request UID: 49fd4326-2db4-488a-ab7a-2340768bbf0d
[14/Nov/2019:15:18:03] ENGINE socket.error 'cannot read from timed out object'
Traceback (most recent call last):
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 1273, in communicate
    req.respond()
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 1077, in respond
    self.server.gateway(self).respond()
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/wsgi.py", line 148, in respond
    self.write(chunk)
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/wsgi.py", line 232, in write
    self.req.ensure_headers_sent()
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 1124, in ensure_headers_sent
    self.send_headers()
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 1190, in send_headers
    self.rfile.read(remaining)
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/server.py", line 381, in read
    data = self.rfile.read(size)
  File "/root/Projects/env/lib/python3.5/site-packages/cheroot/makefile.py", line 410, in read
    val = super().read(*args, **kwargs)
  File "/usr/lib/python3.5/_pyio.py", line 1000, in read
    return self._read_unlocked(size)
  File "/usr/lib/python3.5/_pyio.py", line 1040, in _read_unlocked
    chunk = self.raw.read(wanted)
  File "/usr/lib/python3.5/socket.py", line 572, in readinto
    raise OSError("cannot read from timed out object")
OSError: cannot read from timed out object

Here is some information that may be relevant:

  • multiple files are uploaded in parallel using the Girder Upload JS component
  • upload requests get stalled for longer and longer until the 502 error rise

Does it mean I need to increase some timeout ? Can it be done via Girder ?

Thanks,
Julien.

This is happening below the Girder application level, in the guts of cherrypy’s HTTP server. It looks like a result of network oversaturation, how many parallel uploads are we talking about? You may need to scale your deployment to accommodate more traffic.

Yes, this looks like the kind of thing that happens when a server is overloaded. Unlikely increasing a timeout is going to solve anything. I would look at monitoring the server resources when this is happening. What is the bottleneck? file I/O? low memory? cpu? etc.