The symptom is quite simple: some (less than 1%) of the requests from the frontend app to the API are timing out when reading from the socket. I've ruled out everything on the backend today, including the API itself.
Which still leaves a lot of possibilites, like the load balancer between API and frontend being mischievous, and/or poor error handling by the frontend client.
Oh, and everything is SSL so although I've tested the network between all the machines, I can't dump any traffic without a huge amount of hassle.
@Dammit we can see all the timeouts, and when someone closes the tab/hits refresh (as the server then can't send the response, even if it gets one).
The symptom is quite simple: some (less than 1%) of the requests from the frontend app to the API are timing out when reading from the socket. I've ruled out everything on the backend today, including the API itself.
Which still leaves a lot of possibilites, like the load balancer between API and frontend being mischievous, and/or poor error handling by the frontend client.
Oh, and everything is SSL so although I've tested the network between all the machines, I can't dump any traffic without a huge amount of hassle.
@Dammit we can see all the timeouts, and when someone closes the tab/hits refresh (as the server then can't send the response, even if it gets one).