`ReduceHandle`

RXMesh allows running parallel reduction operations over data stored in attributes, e.g., dot products and norms, max/min and argmax, or arbitrary binary combines via CUB. The ReduceHandle class owns temporary device storage and a two-stage reduction pipeline (per-patch GPU reduce, then CUB DeviceReduce across patches). Scalar results are returned on the host.

ReduceHandle<T, HandleT> is strongly typed with the data type the reduction operation will operate on and the type of mesh element. We define the following aliases for convenience:

VertexReduceHandle<T> → ReduceHandle<T, VertexHandle>
EdgeReduceHandle<T> → ReduceHandle<T, EdgeHandle>
FaceReduceHandle<T> → ReduceHandle<T, FaceHandle>

Construction

To perform any reduction operation, you must first construct a ReduceHandle. This object encapsulates the temporary memory and configuration needed to execute efficient GPU-based reductions. You can use a single ReduceHandle for multiple reduction operations launched one after another. You can create a ReduceHandle in two ways:

ReduceHandle(const Attribute<T, HandleT>& attr)

Use when you already have an attribute on the mesh.

ReduceHandle(const RXMesh& rx)

Works with RXMeshStatic or RXMeshDynamic.

Copy construction is allowed (ReduceHandle(const ReduceHandle&) defaults). The destructor releases internal temporary allocations.

RXMeshStatic rx("mesh.obj");
auto v = rx.add_vertex_attribute<float>("v", 3, DEVICE);

ReduceHandle reduce_handle(*v);  // C++17: class template argument deduction from Attribute<float, VertexHandle>

Dot Product

Pairwise inner product over matching mesh elements.

auto v1 = rx.add_vertex_attribute<float>("v1", 3, DEVICE);
auto v2 = rx.add_vertex_attribute<float>("v2", 3, DEVICE);
// ... fill v1 and v2 on DEVICE ...

ReduceHandle rh(*v1);
float inner = rh.dot(*v1, *v2);

dot(attr1, attr2, attribute_id, stream)

T dot(const Attribute<T, HandleT>& attr1,
      const Attribute<T, HandleT>& attr2,
      uint32_t                     attribute_id = INVALID32,
      cudaStream_t                 stream       = NULL);

attribute_id: reduce a single component column, or INVALID32 for all columns together.
stream: CUDA stream for the device work.

L2 Norm

Compute the squared L2 norm of an attribute:

auto v = rx.add_vertex_attribute<float>("v", 3, DEVICE);
// ... fill v on DEVICE ...
ReduceHandle reduce_handle(*v);
float n = reduce_handle.norm2(*v);

norm2(attr, attribute_id, stream)

T norm2(const Attribute<T, HandleT>& attr,
        uint32_t                     attribute_id = INVALID32,
        cudaStream_t                 stream       = NULL);

Same as in dot()

ArgMin / ArgMax

Returns the value and the corresponding handle of min/max value of the attribute:

auto kv = reduce_handle.arg_max(attr);
// kv.key   — handle of the element with maximum value
// kv.value — corresponding scalar / component value

arg_max / arg_min

KeyValue arg_max(const Attribute<T, HandleT>& attr,
                 uint32_t                     attribute_id = INVALID32,
                 cudaStream_t                 stream       = NULL);

KeyValue arg_min(const Attribute<T, HandleT>& attr,
                 uint32_t                     attribute_id = INVALID32,
                 cudaStream_t                 stream       = NULL);

Same as in dot()

Custom Reduction

Generic reduction with a binary associative operator and neutral initial value init. You can use CUB’s functors (cub::Sum(), cub::Max(), cub::Min(), …) or a small device functor with T operator()(const T& a, const T& b) const.

auto e = rx.add_edge_attribute<uint32_t>("e", 3, DEVICE);
// ... fill e on DEVICE ...

ReduceHandle reduce_handle(*e);
uint32_t m = reduce_handle.reduce(*e, cub::Max(), uint32_t(0));

User-defined reduction operation functor:

struct CustomMin
{
    template <typename U>
    __device__ __forceinline__ U operator()(const U& a, const U& b) const
    {
        return (b < a) ? b : a;
    }
};

reduce(attr, reduction_op, init, attribute_id, stream)

template <typename ReductionOp>
T reduce(const Attribute<T, HandleT>& attr,
         ReductionOp                  reduction_op,
         T                            init,
         uint32_t                     attribute_id = INVALID32,
         cudaStream_t                 stream       = NULL);

init: identity for the operator (e.g. 0 for sum, 0 for cub::Max() on unsigned types as in the tests).
attribute_id: reduce a single component column, or INVALID32 for all columns together.
stream: CUDA stream for the device work.