Batch processing of requests
This section contains information about batch processing of requests (batch requests), with usage examples for the Python, C++, and Java APIs.
When working with the YTsaurus system, each command generates a separate request to the master server. A request has its own cost, often higher than the execution of a command. Therefore, combining several commands into a single request can significantly speed up processes that send many easy commands to Cypress and are waiting for a response most of the time. This request is called a batch request. The Cypress master server will execute the commands from the batch request in random order and return all the obtained results. Errors that occur during the execution of individual commands do not affect other commands.
Concurrency
A batch request on the Cypress master server is executed as a set of independent commands, the specific command execution order is not guaranteed. All commands are counted in the user command quota. A batch request has the concurrency parameter which regulates the concurrency of commands. The parameter value limits the number of concurrently running commands, thus enabling you not to exceed the allowable number of concurrent requests to the master server for a user. Otherwise, the User 'user' has exceeded its request queue size limit error will be returned in response to the request. The default value of the parameter is 50 (concurrently running commands) and the user has a limit of 100, which helps avoid errors involving exceeding the limit. Exceeding the limit is not a fatal error. If it occurs, we recommend sending a request again. If you use high-level SDKs, the request will be sent again automatically.
Python API
In the Python API, there is a create_batch_client() method that enables you to create a client to combine multiple commands into a single batch request (client.create_batch_client()) if you have a simple client (client = yt.YtClient(<cluster-name>)). A usage example is shown in Listing 1:
import yt.wrapper as yt
if __name__ == "__main__":
client = yt.YtClient(<cluster-name>)
batch_client = client.create_batch_client()
list_rsp = batch_client.list("/")
exists_rsp = batch_client.exists("/")
batch_client.commit_batch()
print list_rsp.get_result()
print exists_rsp.get_result()
#['cooked_logs', 'home', 'logs', 'projects', 'statbox', 'sys', 'tmp', 'user_sessions', 'userdata', 'userfeat', 'userstats']
#true
As a result of executing the request using batch_client, a special object of the BatchResponse type, which has the structure shown below, is returned.
class BatchResponse(object):
...
def get_result(self):
...
def get_error(self):
...
def is_ok(self):
...
You cannot specify the format of the returned data by a separate command via batch_client. The reason is that the format of the returned data is the same for the entire batch request.
C++ API
In the C++ API, to execute a batch request, you need to:
- Create an object of the
TBatchRequesttype. - Use it to specify commands (the
TBatchRequestmethods returnTFutureto the appropriate type). - Pass it to the
ExecuteBatchmethod of the client. - Get the results from the previously obtained
TFuture.
However, if some commands were completed with an error, you can find this out by calling future.GetValue or future.HasException (i.e. calling ExecuteBatch will not return an error). A usage example is given below.
#include <mapreduce/yt/interface/client.h>
#include <mapreduce/yt/common/helpers.h>
using namespace NYT;
int main()
{
auto client = CreateClient(<cluster-name>);
TBatchRequest batchRequest;
auto listRsp = batchRequest.List("/", TListOptions().MaxSize(5));
auto existsRsp = batchRequest.Exists("//tmp");
client->ExecuteBatch(batchRequest);
for (auto item : listRsp.GetValue()) {
Cerr << NodeToYsonString(item) << Endl;
}
Cerr << NodeToYsonString(existsRsp.GetValue()) << Endl;
return 0;
}
/* Output of program:
"tmp"
"projects"
"logs"
"userfeat"
"cooked_logs"
%true
*/
Java API
In the Java API, there is an executeBatch method that enables you to create an object of the BatchRequest type to execute multiple commands within a single batch request if you have a Cypress client yt.cypress(). All commands sent via BatchRequest return objects of the future type, which will be executed after calling the execute method.
BatchRequest request = yt.cypress().executeBatch(transactionId, pingAncestorTransactions, Option.empty());
ListF<CompletableFuture<YTreeNode>> futures = Cf.arrayList();
for (YPath path : spec.getInputTables()) {
futures.add(request.get(transactionId, pingAncestorTransactions, path.attribute("sorted"), Cf.set()));
}
request.execute().join();
boolean result = futures.forAll(x -> {
YTreeNode node = x.join();
return node.isBooleanNode() && node.boolValue();
});