LivePortrait2

Running on Zero

App Files Files Community

LivePortrait2 / stf /stf-api-alternative /pytriton /docs /custom_params.md

yerang

Upload 1110 files

e3af00f verified about 1 year ago

preview code

raw

history blame contribute delete

3.93 kB

	<!--
	Copyright (c) 2022-2023, NVIDIA CORPORATION. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	# Custom HTTP/gRPC headers and parameters

	This document provides guidelines for using custom HTTP/gRPC headers and parameters with PyTriton.
	Original Triton documentation related to parameters can be found [here](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_parameters.md).
	Now, undecorated inference function accepts list of Request instances.
	Request class contains following fields:
	- data - for inputs (stored as dictionary, but can be also accessed with request dict interface e.g. request["input_name"])
	- parameters - for combined parameters and HTTP/gRPC headers

	!!! warning "Parameters/headers usage limitations"

	Currently, custom parameters and headers can be only accessed in undecorated inference function (they don't work with decorators).
	There is separate example how to use parameters/headers in preprocessing step (see [here](downloaded_input_data.md))


	## Parameters

	Parameters are passed to the inference callable as a dictionary.
	The dictionary is stored in HTTP/gRPC request body payload.

	## HTTP/gRPC headers

	Custom HTTP/gRPC headers are passed to the inference callable in the same dictionary as parameters,
	but they are stored in HTTP/gRPC request headers instead of the request body payload.
	For the headers it is also necessary to specify the header prefix in Triton config, which is used to distinguish the custom
	headers from standard ones (only headers with specified prefix are passed to the inference callable).

	## Usage

	1. Define inference callable (that one uses one parameter and one header):

	```python
	import numpy as np
	from pytriton.model_config import ModelConfig, Tensor
	from pytriton.triton import Triton, TritonConfig

	def _infer_with_params_and_headers(requests):
	responses = []
	for req in requests:
	a_batch, b_batch = req.values()
	scaled_add_batch = (a_batch + b_batch) / float(req.parameters["header_divisor"])
	scaled_sub_batch = (a_batch - b_batch) * float(req.parameters["parameter_multiplier"])
	responses.append({"scaled_add": scaled_add_batch, "scaled_sub": scaled_sub_batch})
	return responses
	```

	2. Bind inference callable to Triton ("header" is the prefix for custom headers):

	<!--pytest.mark.skip-->
	```python
	with Triton(config=TritonConfig(http_header_forward_pattern="header.*")) as triton:
	triton.bind(
	model_name="ParamsAndHeaders",
	infer_func=_infer_with_params_and_headers,
	inputs=[
	Tensor(dtype=np.float32, shape=(-1,)),
	Tensor(dtype=np.float32, shape=(-1,)),
	],
	outputs=[
	Tensor(name="scaled_add", dtype=np.float32, shape=(-1,)),
	Tensor(name="scaled_sub", dtype=np.float32, shape=(-1,)),
	],
	config=ModelConfig(max_batch_size=128),
	)

	triton.serve()
	```

	3. Call the model using ModelClient:

	<!--pytest-codeblocks:cont-->

	```python
	import numpy as np
	from pytriton.client import ModelClient

	batch_size = 2
	a_batch = np.ones((batch_size, 1), dtype=np.float32) * 2
	b_batch = np.ones((batch_size, 1), dtype=np.float32)
	```
	<!--pytest.mark.skip-->
	```python
	with ModelClient("localhost", "ParamsAndHeaders") as client:
	result_batch = client.infer_batch(a_batch, b_batch, parameters={"parameter_multiplier": 2}, headers={"header_divisor": 3})
	```