Skip to content

for_each

RXMesh uses for_each for two different kinds of parallel work:

Mode What it does Templates
Element-wise Visit each vertex, edge, or face with a handle only, i.e., no neighborhood / connectivity information No query Op; optional overload dispatches on VertexHandle, EdgeHandle, or FaceHandle
Connectivity-based For each seed element, traverse neighbors via an iterator (same neighborhood model as the Op enum) for_each<Op, blockThreads>(...) — template args are the query Op and block thread count

If you need multiple different Op operations or custom shared memory in one kernel, use run_kernel.


Element-wise for_each

Use this when your work uses only a single handle per invocation (no adjacency). Each element is a handle.

location is a bitmask of where to run, i.e., you can set HOST and DEVICE so that both code paths run. stream and with_omp apply to the device and host paths respectively (defaults: stream = NULL, with_omp = true).

RXMeshStatic rx("mesh.obj");
auto vertex_pos = rx.get_input_vertex_coordinates();
auto vertex_color = *rx.add_vertex_attribute<float>("vColor", 3, DEVICE);

rx.for_each_vertex(
    DEVICE,
    [vertex_color, vertex_pos] __device__(const VertexHandle vh) {
        vertex_color(vh, 0) = 0.9f;
        vertex_color(vh, 1) = vertex_pos(vh, 1);
        vertex_color(vh, 2) = 0.9f;
    });
for_each_vertex(location, apply, stream = NULL, with_omp = true) const

Applies apply to each visited vertex; apply should take a VertexHandle. location is a bitmask of host/device execution. stream is the CUDA stream when the device path runs. with_omp controls OpenMP on the host path (default true); it is ignored if you do not run on HOST.

for_each_edge(location, apply, stream = NULL, with_omp = true) const

Same as for_each_vertex, but apply receives an EdgeHandle.

for_each_face(location, apply, stream = NULL, with_omp = true) const

Same as for_each_vertex, but apply receives a FaceHandle.

for_each<HandleT>(location, apply, stream = NULL, with_omp = true)

Dispatches to for_each_vertex, for_each_edge, or for_each_face when HandleT is VertexHandle, EdgeHandle, or FaceHandle respectively. Other HandleT types are not supported by this overload.

CUDA callables

For DEVICE execution, apply must be a device-callable lambda (e.g. __device__ or a __device__ __host__ extended lambda) that capture data by-value. For host-only execution, a normal host callable is enough.

More background: CUDA C++ Programming Guide — extended lambdas.

Device launch shape (element-wise)

On the device path, RXMesh launches one CUDA block per patch, with 256 threads per block and no dynamic shared memory in this entry point, on the chosen stream.


Connectivity-based for_each

These overloads launch a kernel where each thread processes one seed mesh element (vertex, edge, or face depending on Op) and exposes neighborhood connectivity through an iterator. The iterator type depends on Op (for example Op::FV yields vertex neighbors of each face). This is only supported on the device.

The example computes squared edge lengths using Op::EV:

RXMeshStatic rx("mesh.obj");

auto x = *rx.get_input_vertex_coordinates();
auto len = *rx.add_edge_attribute<float>("eLength", 1);

constexpr int blockSize = 256;

rx.for_each<Op::EV, blockSize>(
    [=] __device__(const EdgeHandle& eh, const VertexIterator& iter) {
        Eigen::Vector3f a = x.to_eigen<3>(iter[0]);
        Eigen::Vector3f b = x.to_eigen<3>(iter[1]);
        len(eh) = (a - b).squaredNorm();
    });

for_each<Op, blockThreads> is limited to one query Op per launch. For kernels that combine multiple queries or extra shared memory, use run_kernel.

for_each<op, blockThreads>(user_lambda, oriented = false, stream = NULL) const
template <Op op, uint32_t blockThreads, typename LambdaT>
void for_each(const LambdaT user_lambda,
              const bool    oriented = false,
              cudaStream_t  stream   = NULL) const;
  • op: compile-time query (e.g. Op::EV).
  • blockThreads: CUDA block size (e.g. 256).
  • user_lambda: invoked per seed element on the device; signature (InputHandle, OutputIterator&).
  • oriented (default false): oriented traversal where supported (e.g. Op::VV, Op::VE).
  • stream: CUDA stream for the launch (default null stream).

RXMesh fills a LaunchBox<blockThreads> via prepare_launch_box for this query and launches the internal query kernel.

for_each<op, blockThreads>(lb, user_lambda, oriented = false, stream = NULL) const

template <Op op, uint32_t blockThreads, typename LambdaT>
void for_each(LaunchBox<blockThreads> lb,
              const LambdaT           user_lambda,
              const bool              oriented = false,
              cudaStream_t            stream   = NULL) const;
Same contract as above, but you supply a precomputed LaunchBox (using prepare_launch_box) to reuse launch configuration across repeated launches. The lb must match the same op, blockThreads, and internal query kernel.

Supported neighbor query types

Vertex queries
Query Description
VV For vertex V, adjacent vertices
VE For vertex V, incident edges
VF For vertex V, incident faces
Edge queries
Query Description
EV For edge E, incident vertices
EF For edge E, incident faces
EVDiamond For edge E, incident and opposite vertices
Face queries
Query Description
FV For face F, incident vertices
FE For face F, incident edges
FF For face F, adjacent faces