Overview of UDFs in C++

In C++ code, a UDF is simply a TBoxedValue subclass with the mandatory Run() method.

An easy way to implement a particular UDF inside a UDF module is to use a class with three overridden functions (Name(), DeclareSignature(...), and Run(...)) to be called within the module as needed.

Example of a UDF module class:

class TSomeYQLModule: public IUdfModule {
public:    
    TStringRef Name() const {
        return TStringRef::Of("SomeModule");
    }

    void GetAllFunctions(IFunctionsSink& sink) const final {
        sink.Add(TSomeUdfFunction::Name());
        sink.Add(TSomeTypeAwareUdfFunction::Name())->SetTypeAwareness();
    }

    void BuildFunctionTypeInfo(const TStringRef& name,
                               TType* userType,
                               const TStringRef& typeConfig,
                               ui32 flags,
                               IFunctionTypeInfoBuilder& builder) const override {
        try {
            Y_UNUSED(typeConfig);

            bool typesOnly = (flags & TFlags::TypesOnly);

            if (TSomeUdfFunction::Name() == name) {
                TSomeUdfFunction::DeclareSignature(typesOnly, builder);
            } else if (TSomeTypeAwareUdfFunction::Name() == name) {
                TSomeTypeAwareUdfFunction::DeclareSignature(
                        typesOnly, builder, userType);
            }                    
        } catch (const std::exception& e) {
            builder.SetError(CurrentExceptionMessage());
        }
    }
};

Method description:

  • static const TStringRef& Name()

    A method that returns the function name to be used in YQL. In this implementation, the method is used to add a function to a UDF module.

  • static bool DeclareSignature(bool typesOnly, IFunctionTypeInfoBuilder& builder [, TType* userType])

    A method that declares the function signature. It is used in the BuildFunctionTypeInfo method of a UDF module to get information about the indexes of structure fields used inside the class and to create an instance of a particular UDF.

  • TUnboxedValue Run(const IValueBuilder* valueBuilder, const TUnboxedValuePod* args) const override

    A method that implements all the internal logic of a UDF. When the UDF is called from YQL, it is this particular method that is called.

Example of a UDF class:

class TMyUdfFunction : public ::NKikimr::NUdf::TBoxedValue {
public:
    // Each member struct is a C++ description of a YQL struct and has it's own
    // constructor that takes in a builder instance and initializes a proper
    // struct from it. If you want to build your own YQL struct and work with
    // it in C++, you will need to write the same kind of constructor. These
    // structs contain indices from the YQL runtime (in C++ you can't use field
    // names, only indices) and TType* ResultStructType (YQL description of the
    // struct), which is a description of the struct understandable for YQL and
    // will be used later when specifying the return type of the UDF.
    struct TMemberIndices {
        TMemberIndices(::NKikimr::NUdf::IFunctionTypeInfoBuilder& builder) {
            ...
        }
    };

    // Name() function, needed to register UDF in YQL, the function in YQL will
    // have the name we specify here. In our UDF Module class we have a
    // GetAllFunctions method, where we add all function names into our module.
    static const ::NKikimr::NUdf::TStringRef& Name() {
        static auto name = ::NKikimr::NUdf::TStringRef::Of("MyUdfFunction");
        return name;
    }

    // This function is also used in the UDF module class in the
    // BuildFunctionTypeInfo method. Here we use the passed in builder instance
    //      1) to take information about indices we need for our class to work
    //      2) to create an instance of this UDF function
    // This is the only function where we can grab info about indices for our
    // class constructor, thus this is the only place where constructor is
    // called
    static bool DeclareSignature(
            bool typesOnly, ::NKikimr::NUdf::IFunctionTypeInfoBuilder& builder,
            ::NKikimr::NUdf::TType* userType) {
        auto members = buildReturnSignature(builder, userType);
        // If typesOnly flag is specified, we don't need to register
        // implementation, just build the signature
        if (!typesOnly) {
            builder.Implementation(new TMyUdfFunction(members));
        }
        return true;
    }

    // Run() function is derived from TBoxedValue (and we need to override it)
    // and contains our UDF logic. (When we call it from YQL, this function is
    // what is gonna happen).
    // It always has signature
    // Run(IValueBuilder*, TUnboxedValuePod*) -> TUnboxedValue
    ::NKikimr::NUdf::TUnboxedValue Run(
            const ::NKikimr::NUdf::IValueBuilder* valueBuilder,
            const ::NKikimr::NUdf::TUnboxedValuePod* args) const override {
        ...
    }

private:
    TMemberIndices IndicesDescription;

    /// Constructor accepts YQL struct member descriptions and source code
    /// position, for logging.
    /// Instance of this class is created within DeclareSignature method,
    /// as that is the only method that knows YQL Struct member indices
    /// For the reasons above, it is made private
    TMyUdfFunction(const TMemberIndices& indicesDescription,
                   const ::NKikimr::NUdf::TSourcePosition& pos) {
        ...
    }

    /// This function will initialize proper signature on given builder.
    /// It will also return a struct with indices of YQL Struct members (in C++
    /// you have to access YQL Struct members by index, not by name)
    static TMemberIndices buildReturnSignature(
            ::NKikimr::NUdf::IFunctionTypeInfoBuilder& builder,
            ::NKikimr::NUdf::TType* userType) {
        ...
    }
};