• It is strongly recommended that you configure a health check path in your container; otherwise, you will get inferences failures whenever you try to make an inference request before your server is ready for inferences. The only times when you do not need to configure a health check path is when your server becomes ready for inferences instantaneously because your model is tiny (this is typically unlikely to be true for AI use cases).
  • If you define a healthcheck, your endpoint has 5 minutes from the time the the image is pulled to return a 200 OK response. This will mark the endpoint as available, and the same criteria applies for additional replicas. If there are 3 consecutive calls to the healthcheck endpoint that return a non-200 OK status, then the replica will be restarted.
  • You can see an example of a health check path in our Flan T5 container in Advanced: Build a Container from Scratch in Python. The health check path exposed by that container is /healthcheck. After the endpoint is created, one should be able to hit https://<endpoint-name>-<account-id>.octoai.run/healthcheck for a 200 response whenever the server is healthy and ready.