Spaces:
Running
on
Zero
Running
on
Zero
| <!-- | |
| Copyright (c) 2022-2023, NVIDIA CORPORATION. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); | |
| you may not use this file except in compliance with the License. | |
| You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software | |
| distributed under the License is distributed on an "AS IS" BASIS, | |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| See the License for the specific language governing permissions and | |
| limitations under the License. | |
| --> | |
| # Custom HTTP/gRPC headers and parameters | |
| This document provides guidelines for using custom HTTP/gRPC headers and parameters with PyTriton. | |
| Original Triton documentation related to parameters can be found [here](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_parameters.md). | |
| Now, undecorated inference function accepts list of Request instances. | |
| Request class contains following fields: | |
| - data - for inputs (stored as dictionary, but can be also accessed with request dict interface e.g. request["input_name"]) | |
| - parameters - for combined parameters and HTTP/gRPC headers | |
| !!! warning "Parameters/headers usage limitations" | |
| Currently, custom parameters and headers can be only accessed in undecorated inference function (they don't work with decorators). | |
| There is separate example how to use parameters/headers in preprocessing step (see [here](downloaded_input_data.md)) | |
| ## Parameters | |
| Parameters are passed to the inference callable as a dictionary. | |
| The dictionary is stored in HTTP/gRPC request body payload. | |
| ## HTTP/gRPC headers | |
| Custom HTTP/gRPC headers are passed to the inference callable in the same dictionary as parameters, | |
| but they are stored in HTTP/gRPC request headers instead of the request body payload. | |
| For the headers it is also necessary to specify the header prefix in Triton config, which is used to distinguish the custom | |
| headers from standard ones (only headers with specified prefix are passed to the inference callable). | |
| ## Usage | |
| 1. Define inference callable (that one uses one parameter and one header): | |
| ```python | |
| import numpy as np | |
| from pytriton.model_config import ModelConfig, Tensor | |
| from pytriton.triton import Triton, TritonConfig | |
| def _infer_with_params_and_headers(requests): | |
| responses = [] | |
| for req in requests: | |
| a_batch, b_batch = req.values() | |
| scaled_add_batch = (a_batch + b_batch) / float(req.parameters["header_divisor"]) | |
| scaled_sub_batch = (a_batch - b_batch) * float(req.parameters["parameter_multiplier"]) | |
| responses.append({"scaled_add": scaled_add_batch, "scaled_sub": scaled_sub_batch}) | |
| return responses | |
| ``` | |
| 2. Bind inference callable to Triton ("header" is the prefix for custom headers): | |
| <!--pytest.mark.skip--> | |
| ```python | |
| with Triton(config=TritonConfig(http_header_forward_pattern="header.*")) as triton: | |
| triton.bind( | |
| model_name="ParamsAndHeaders", | |
| infer_func=_infer_with_params_and_headers, | |
| inputs=[ | |
| Tensor(dtype=np.float32, shape=(-1,)), | |
| Tensor(dtype=np.float32, shape=(-1,)), | |
| ], | |
| outputs=[ | |
| Tensor(name="scaled_add", dtype=np.float32, shape=(-1,)), | |
| Tensor(name="scaled_sub", dtype=np.float32, shape=(-1,)), | |
| ], | |
| config=ModelConfig(max_batch_size=128), | |
| ) | |
| triton.serve() | |
| ``` | |
| 3. Call the model using ModelClient: | |
| <!--pytest-codeblocks:cont--> | |
| ```python | |
| import numpy as np | |
| from pytriton.client import ModelClient | |
| batch_size = 2 | |
| a_batch = np.ones((batch_size, 1), dtype=np.float32) * 2 | |
| b_batch = np.ones((batch_size, 1), dtype=np.float32) | |
| ``` | |
| <!--pytest.mark.skip--> | |
| ```python | |
| with ModelClient("localhost", "ParamsAndHeaders") as client: | |
| result_batch = client.infer_batch(a_batch, b_batch, parameters={"parameter_multiplier": 2}, headers={"header_divisor": 3}) | |
| ``` | |