Debugging jobs locally
To make job debugging more convenient (for example, with GDB), you can use a special utility, yt job-tool
. It downloads the job environment along with its input data and generates a script to run it.
The utility comes as part of the Python API package. All types of jobs that run custom code are supported.
The binary file supports two commands: prepare-job-environment
and run-job
. To get detailed information, use the yt job-tool --help
command.
Job's input data
By default, job-tool
gets full job input, although it isn't available for all jobs:
- The specification required to get full input is saved for a few failed and a few successfully completed jobs. It is also available for all running jobs.
- Getting full input requires the data read by the job: input tables must be available and unmodified.
- "reduce_combiner" and "reduce" jobs of a MapReduce operation read temporary data that is only available until the operation has completed. To debug such jobs, you can rewrite the operation as a combination of Map, Sort, and Reduce. Or you can debug them while the operation is still running (including while it is suspended).
Example
-
Let's run this operation:
import yt.wrapper as yt def mapper(rec): raise RuntimeError("fail") if __name__ == "__main__": yt.run_map(mapper, "//home/user/tables/dsv", "//home/user/output", format="dsv")
-
When the operation fails, we diagnose it:
yt job-tool prepare-job-environment ffc8f462-7f68f8c3-3fe03e8-433fe11f ffe4ff5c-d2fac8c1-3fe0384-816a7fd0
-
Once the command has finished, the folder
job_ffe4ff5c-d2fac8c1-3fe0384-816a7fd0
will contain the following files:$ ls job_ffe4ff5c-d2fac8c1-3fe0384-816a7fd0 command input run_gdb.sh run.sh sandbox
Where:
-
The
command
file contains the command that runs the job. Since the operation was run using the Python API, the code will look like this:cat job_ffe4ff5c-d2fac8c1-3fe0384-816a7fd0/command python _py_runner.py mapper.lV4l9c config_dump6RRYa4 _modulesEFMotR _main_moduleJALUsC.py _main_module PY_SOURCE%
-
input
is the job input; it could be worth analyzing in some cases. -
run.sh
is a shell script that runs the job locally. -
run_gdb.sh
is a shell script that runs the job with GDB, which is useful for debugging C++ programs.Note
You can modify the
run_gdb.sh
andrun.sh
scripts, for example, to integrate a custom debugger. -
sandbox
is the directory where the job was run, along with all the files necessary to run it.
As you debug, you can try updating files in the directory.
-
-
Run the job locally using the
run.sh
script:./run 2016-07-22 12:00:33,499 INFO Started job process User job exited with non-zero exit code 1 with stderr: Traceback (most recent call last): File "_py_runner.py", line 56, in <module> main() File "_py_runner.py", line 53, in main yt.wrapper.py_runner_helpers.process_rows(__operation_dump_filename, __config_dump_filename, start_time=start_time) File "/home/user/yt/python/yt/wrapper/py_runner_helpers.py", line 154, in process_rows output_format.dump_rows(result, streams.get_original_stdout(), raw=raw) File "/home/user/yt/python/yt/wrapper/format.py", line 137, in dump_rows self._dump_rows(rows, stream) File "/home/user/yt/python/yt/wrapper/format.py", line 251, in _dump_rows for row in rows: File "/home/user/yt/python/yt/wrapper/py_runner_helpers.py", line 89, in process_frozen_dict for row in rows: File "/home/user/yt/python/yt/wrapper/py_runner_helpers.py", line 49, in generator result = func(*args) File "<stdin>", line 2, in mapper RuntimeError: fail origin hostname in 2016-07-22T12:00:35.427161Z