Skip to content

Resource Manager

Introduction

Industrial Edge apps may claim certain resources provided by the system. Resources can be hardware devices, external interfaces, software entities or the like. They are organized in resource classes, which define the type of a resource (e.g., a processor core, network interface, or a GPU). For each resource class, there can be multiple instances representing the actual devices, interfaces, etc. These instances are available for usage by Industrial Edge apps, and the mapping of instances to containers is done by the Resource Manager. This documentation provides information for app developers on how to specify resource claims.

Prerequisites

Using the Resource Manager requires an IEDK of at least version 1.16.0.

Overview

Each resource class is managed by a device plugin. This device plugin must be running on the system for the resource class to be available. The plugins automatically register with the Resource Manager and immediately provide claimable resources. When a plugin is not running, any attempt to allocate a resource of that type will fail (the Resource Manager will return an error and the app will fail to start).

Resource claims are specified in an app's Docker Compose file using extensions. Essentially, as an app developer, you only have to add an <service>:x-resources:limits:<resname>:<count> value in the Docker Compose file to claim count resources of type resname for the service service. The number count can be any natural number.

NOTICE

Additionally, you have to add the entry runtime: iedge so that the extension field x-resources is handled correctly. If runtime: iedge is missing, the resource claim is silently ignored.

For example, here is a minimalistic docker-compose.yml, where the app claims one instance of resource class my_resource:

version: '2.4'
services:
  my_app:
    image: my_image
    runtime: iedge
    x-resources:
      limits:
        my_resource: 1

By default, all resource allocations are done exclusively. If one app has successfully claimed a resource, it is not available to any other app. Consequently, if all instances of a resource class are exhausted, an error is returned.

NOTICE

If there are multiple resource claims in the same limits or definitions section, only the last one is considered, which is consistent with Docker's standard behavior.

Advanced Usage

Tweaking Docker Compose Files

In case there are multiple services in the Docker Compose file, you must add an entry <service>:environment:IEDGE_SERVICE_NAME:<service> for each service. Here is an example:

version: '2.4'
services:
  foo:
    image: my_image
    runtime: iedge
    environment:
      - IEDGE_SERVICE_NAME=foo
    x-resources:
      limits:
        resource_foo: 1
  bar:
    image: debian
    runtime: iedge
    environment:
      - IEDGE_SERVICE_NAME=bar
    x-resources:
      limits:
        resource_bar: 1

NOTICE

Not every service needs the x-resources tag, but IEDGE_SERVICE_NAME must always be set if there is more than one service.

Docker Compose files utilizing Industrial Edge's Resource Manager can also be used stand-alone for development and testing. docker-compose up will work as expected, at least if the iedge container runtime is installed on the system and the Resource Manager is running in the background. Apart from an additional variable <service>:environment:IEDGE_COMPOSE_PATH:<path-of-compose-file> indicating the path to the Docker Compose file, no other modifications are required. This is not needed if an Industrial Edge app is built.

The above two environment variables are due to a current limitation of Docker Compose. The iedge runtime needs access to the x-resources section, which docker-compose does not pass to the OCI runtime.

Resource Metadata Filtering

Besides claiming an arbitrary instance of a certain resource class, it is also possible to specify filter criteria such that only instances satisfying given conditions w.r.t. to their metadata are allocated. This becomes useful when apps require certain hardware-specific features.

Metadata is a set of key-value pairs associated with resource instances, where all keys are strings and the values can be either numbers, strings, objects containing further key-value pairs, or a list of those types. The metadata values of individual resource instances can differ, however the schema stays the same per resource class. Each resource class has its own metadata schema, or no metadata at all.

The filtering mechanism is inspired by the "Label selector" feature in Kubernetes, documented here. The supported filtering conditions are In, NotIn, Contains, and NotContains.

  • In checks if the metadata value for the given key is in the given set of values
  • NotIn is the opposite of In
  • Contains checks if all entries in values are contained in the metadata value for the given key (which must be a list or dict)
  • NotContains checks if no entry in values is contained in the metadata value for the given key (which must be a list or dict) (this is NOT the opposite of Contains)

NOTICE

When using the Contains filter condition, if the provided values consist of a list of strings and the metadata value associated with the given key is also a list of strings, the system checks whether the values are contained within the metadata list.
In case the provided values are a list of maps and the metadata value associated with the given key is a map, the system checks whether the given values serve a sub-map of the metadata corresponding to that key, thus checking if all keys and values match.
If the provided values are a list of strings and the metadata value associated with the given key is a map, the system checks whether the given values correspond to keys within the metadata map.
For the NotContains case, the check if values are contained or not is done the same way.

An example leveraging those keywords for claiming CPU resources is provided below. In this example, a CPU core must satisfy the following hardware characteristics (cf. section on CPU isolation):

  • Supports avx512 instructions: cpuinfo.flags is a list of strings, and one can filter using Contains if the given string avx512 is contained in that list.
  • Has a CPU architecture of either x86_64 or i386: lscpu.Architecture has a string value, and one can filter using In if one of the given values is equal to the lscpu.Architecture value.
version: '2.4'
services:
  my_cpu_filter_app:
    image: my_image
    runtime: iedge
    x-resources:
      definitions:
        my_specific_cpu:
          type: siemens.com/isolated_core
          matchExpressions:
          - {key: "cpuinfo.flags", operator: "Contains", values: ["avx512"]}
          - {key: "lscpu.Architecture", operator: "In", values: ["x86_64", "i386"]}
      limits:
        my_specific_cpu: 1

Exclusive and Shared Resources

The resource plugins define whether their resources get mapped exclusively to applications or whether they are shared between applications by default. Application developers can override the default by creating a custom resource class definition and explicitly requesting shared or exclusive resources by specifying shared: true or shared: false:

version: '2.4'
services:
  my_app:
    image: my_image
    runtime: iedge
    x-resources:
      definitions:
        my_shared_resource_class:
          type: siemens.com/some_sharable_resource_class
          shared: true
      limits:
        my_shared_resource_class: 1

Each plugin separates its advertised resources into disjoint resource pools for exclusive and shared use in applications. Plugins can decide to advertise only shared resources (such as shared memories) or only exclusive resources (such as isolated cores), in which case one of the pools is empty and specifying the shared keyword is not recommended.

Resource Configuration

Certain resource classes allow application developers to provide additional configuration parameters when claiming a resource instance. This way, individual instances can be tailored to the needs of the application. It is recommended to use this mechanism rather than configuring the resources as part of the application to minimize the required application permissions.

Within the application manifest, the configuration is added to a custom resource class definition via the config section. In this config section, developers specify properties using key-value pairs that are applied during the application startup phase, i.e., prior to execution of the application's entry point.

version: '2.4'
services:
  my_app:
    image: my_image
    runtime: iedge
    x-resources:
      definitions:
        my_configured_resource_class:
          type: siemens.com/some_configurable_resource_class
          config:
            first_key: 3
            second_key:
              - "first list entry string"
              - "second list entry string"
            third_key:
              first_subkey: false
      limits:
        my_configured_resource_class: 1

NOTICE

Valid configuration parameters along with their value ranges and the effects they produce are described in the documentation for the respective resource classes.

CPU Isolation

Isolating processor cores is one of the "ingredients" for real-time applications. To ensure that an app runs exclusively on one or more cores, use the siemens.com/isolated_core resource class, for example:

x-resources:
  limits:
    siemens.com/isolated_core: 1

This way, no other app can run on the same core(s).

NOTICE

When requesting isolated cores for shared usage the claim will be rejected by the Resource Manager.

NOTICE

The request specifies the number of isolated cores needed by an app, with the Resource Manager deciding which cores will be allocated.

NOTICE

For the hybrid cpu device, which includes performance cores and efficiency cores, Resource Manager always allocates performance cores with a higher priority.

The configuration passed to the container runtime contains an environment variable IEDGE_CPUSET_ISOLATED with a cpuset string specifying the isolated cores.

If your device provides the CPU isolation plugin, it is also prepared for executing real-time applications. This means that it ships with a real-time capable Linux kernel and comes with basic system tunings. Depending on your app's requirements, additional measures may be necessary that cannot or should not be dealt with by Industrial Edge as the underlying platform.

Metadata

Continuing the resource filtering example from above, we now present a shortened CPU resource metadata filtering schema:

Excerpt from the CPU Metadata Schema
{
  "schema": {
    "properties": {
      "cpuinfo": {
        "properties": {
          ...
          "CPU architecture": {
            "type": "integer"
          },
          "core id": {
            "type": "integer"
          },
          "cpu cores": {
            "type": "number"
          },
          "flags": {
            "items": {
              "type": "string"
            },
            "type": "array"
          },
          ...
        },
        "title": "CPU Info",
        "type": "object"
      },
      "cpuset": {
        "items": {
          "type": "integer"
        },
        "type": "array"
      },
      "lscpu": {
        "properties": {
          ...
          "Architecture": {
            "type": "string"
          },
          "CPU max MHz": {
            "type": "string"
          },
          "CPU min MHz": {
            "type": "string"
          },
          "Virtualization": {
            "type": "string"
          },
          ...
        },
        "title": "Lscpu",
        "type": "object"
      },
      "lscpue": {
        "properties": {
          ...
          "cpu": {
            "type": "integer"
          },
          "maxmhz": {
            "type": "number"
          },
          "mhz": {
            "type": "number"
          },
          "minmhz": {
            "type": "number"
          },
          ...
        },
        "title": "LscpuE",
        "type": "object"
      }
    },
    "title": "CPU Metadata",
    "type": "object"
  }
}

The metadata is generated by combining the outputs of /proc/cpuinfo, lscpu, and lscpu -e -J (nested objects are linearized using . (dots)), which are then grouped together for the individual CPU cores. Not all metadata fields are supported on all architectures. The lscpu and lscpue fields (as provided by the lscpu tool) aim to provide data in a format independent from the processor architecture, but not all systems provide all values. The contents of cpuinfo (as provided by /proc/cpuinfo) are more low-level and specific to the processor architecture.

NOTICE

It is not recommended to filter for specific core numbers / IDs as the isolatable CPU cores might differ between IEDs and are thus not generic.

Tuning of Kernel Threads

The CPU isolation plugin allows application developers to tune the scheduling policies and real-time priorities of selected per-cpu kernel threads on the isolated cores of the application.

This way, the real-time priority of the kernel threads relative to the application threads can be adjusted. Scheduling policies are specified as resource configuration parameters in a custom resource definition.

The plugin supports the configuration fields ksoftirqd_sched and ktimers_sched.

Possible values are strings in the format [fbroi][0-9]+, where the leading character specifies the requested scheduling policy, and the trailing number corresponds to the requested real-time priority in the range 0-99.

The supported characters and their corresponding scheduling policies are listed below:

  • f: SCHED_FIFO
  • b: SCHED_BATCH
  • r: SCHED_RR
  • o: SCHED_OTHER
  • i: SCHED_IDLE

The scheduling policies and real-time priorities are applied prior to application start and reverted after the application has stopped.

For example, to request the ksoftirqd and ktimers kernel thread policies to be set to SCHED_FIFO and to obtain real-time priorities 51 and 52, respectively, on the isolated CPUs of the application, the application developer can adapt the resource claim as follows:

x-resources:
  definitions:
    application_tuned_core:
      type: siemens.com/isolated_core
      config:
        ksoftirqd_sched: "f51"
        ktimers_sched: "f52"
  limits:
    application_tuned_core: 1

NOTICE

Application-specific tuning of kernel threads per CPU requires the Industrial Edge device to support dynamic tuning of kernel threads. The tuning parameters are considered to be mandatory. If the edge device does not support dynamic tuning, the application will refuse to start.

Network Interface Isolation

For applications claiming isolated network interfaces, it is essential to obtain a network interface connected to a specific network. The matching of interfaces with apps is accomplished using labels. As an Industrial Edge administrator, you can specify which interfaces shall be isolatable and attach one or more labels to them (optionally with a VLAN tag). However, when marking interfaces as isolatable, the administrator is limited to those interfaces the device builder allows to isolate.

Isolated network interfaces can be requested in docker-compose.yml, for example:

services:
  my_service:
    ...
    networks:
      - my_isolated_network

networks:
  my_isolated_network:
    driver: iedge
    driver_opts:
      label: foobar
      prefix: rt
    ipam:
      driver: "null"

The driver option prefix specifies the prefix of the network interface name inside the container. The driver option label can be used to filter isolatable network interface candidates based on the given label.

NOTICE

Network interface isolation uses a different syntax compared to the conventional resource claiming syntax. Trying to isolate network interfaces using the conventional way will fail.

The Resource Manager cannot provide information about the isolated network interface to the application due to technical limitations. Instead, the application has to rely on the prefix specified in the driver options in order to determine the isolated network interface name inside the container (e.g., rt0). The Docker network plugin ensures that an optionally configured VLAN tag is used when communicating across the given network. The ipam subsection ensures that the network interface is handed over to the container without any IP configuration. IP configuration will also be reset when the network is deleted before the interface moves back into the host's network namespace.

PTP Device Support

The Precision Time Protocol (PTP) is a protocol to synchronize clocks between devices. Some Network Interface Cards have hardware support for this protocol to achieve accurate time synchronization over the network. The device on the Network Interface Card which enables this hardware support is referred to as "PTP Device". If the Network Interface Card of an isolated network has a PTP device, then the PTP device is also mounted to the requesting Docker container as a read-write device.

The application inside a Docker container knows about the PTP device and other network-related information by reading the following environment variables:

  • IEDGE_NETWORKS_PREFIXES: The prefix given in the driver_opts section of the iedge network in docker-compose.yml. If none is given, the default is eth.
  • IEDGE_NETWORKS_LABELS: The label given in the driver_opts section of the iedge network in docker-compose.yml.
  • IEDGE_NETWORKS_PTP_DEVICES: The path of the PTP devices mounted to the Docker container.
  • IEDGE_NETWORKS_ANNOTATIONS: A JSON array containing all the labels and their VLAN tag of the isolated network. This information can be configured in the "Resource Manager" tab of the IED settings where NICs are assigned to labels and VLANs.
  • IEDGE_NETWORKS_ISOLATED: The host name of the isolated and renamed network. Networks are renamed according to the prefix given in docker-compose.yml with an additional counter. If a container claims two isolated networks, both with prefix rt, one network will be called rt0 and the other network will be called rt1 inside the container.

The environment variable order is not deterministic, but the columns of the environment variables are always matching. Let us consider a concrete example. Assume a docker-compose.yml that claims two networks, foo and bar:

  • foo is called eno3 on the host, its PTP device is at /dev/ptp1, and its VLAN tag is 123.
  • bar is called eno2 on the host and its PTP device is at /dev/ptp0.

Then, both of the following variable assignments are possible:

IEDGE_NETWORKS_PREFIXES=foo,bar
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp1,/dev/ptp0
IEDGE_NETWORKS_ISOLATED=eno3,eno2
IEDGE_NETWORKS_ANNOTATIONS=[{"foo":123},{"bar":0}]
IEDGE_NETWORKS_LABELS=foo,bar
IEDGE_NETWORKS_PREFIXES=bar,foo
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp0,/dev/ptp1
IEDGE_NETWORKS_ISOLATED=eno2,eno3
IEDGE_NETWORKS_ANNOTATIONS=[{"bar":0},{"foo":123}]
IEDGE_NETWORKS_LABELS=bar,foo

As one can see, the order has changed, but the columns and their values are still matching.

NOTICE

A single Network Interface Card can have multiple labels with multiple different VLAN tags. This is reflected in the individual JSON array elements of the IEDGE_NETWORKS_ANNOTATIONS environment variable, where the keys are the labels and the values are their respective VLAN tag. The JSON array length remains unchanged and reflects the number of isolated networks such that the columns can still be matched. Given above configuration of eno2 with label bar and no VLAN tag and assume eno3 has label foo with VLAN tag 123 and label foobar with VLAN tag 42. Then, the value of IEDGE_NETWORKS_ANNOTATIONS could either be [{"foo":123,"foobar":42},{"bar":0}] or [{"bar":0},{"foo":123,"foobar":42}].

In all cases, the IEDGE_NETWORKS_* environment variables are matching: If, for example, only the second (out of two) isolated networks has a PTP Device associated, the value of IEDGE_NETWORKS_PTP_DEVICES= will be ,/dev/ptp0 or /dev/ptp0,, respectively. In case of three isolated networks, but none with a PTP device, the value of IEDGE_NETWORKS_PTP_DEVICES will be ,,.

Tuning of Network I/O Path

The network plugin allows application developers to fine-tune the I/O path of their isolated network interfaces. This comprises the affinities of the hardware IRQs, the IRQ threads, and NAPI threads of the isolated network interface. The tuning is requested by specifying additional configuration parameters in the driver_opts section next to label and prefix.

Affinities are specified as CPU lists.

A CPU list is a string of comma-separated CPU ranges, where CPU ranges can either be single integers or a range of CPUs specified by two integers with a dash inbetween.

In contrast to CPU lists as they are used, for example, in the Linux sysfs to identify online CPUs in the system, the CPU lists specified in driver_opts refer to the set of isolated CPUs of the application.

As an example, the affinity 0,2-3 would translate to the first, third, and fourth isolated CPU of the application.

The following table summarizes all supported configuration parameters for tuning the network I/O path:

Configuration Parameter Value Type Pattern Description
irq_affinity string empty, * or CPU list The affinity to be applied to all IRQs associated with the isolated network interface. If the string is empty or *, the affinities of all IRQs remain unchanged and thus, they stay on the house-keeping cores. In all other cases, the string encodes a CPU list, where the CPU IDs refer to the isolated CPUs.
irq_threads_affinity string empty, * or CPU list The affinity to be applied to all IRQ threads associated with the isolated network interface. If the string is empty or *, the affinities of all IRQ threads remain unchanged and thus, they stay on the house-keeping cores. In all other cases, the string encodes a CPU list, where the CPU IDs refer to the isolated CPUs.
napi_threads string true or false Activate or deactivate NAPI threads for the isolated network interface if configuration is specified. NAPI activation remains unchanged otherwise.
napi_threads_affinity string empty, * or CPU list The affinity to be applied to all NAPI threads associated with the isolated network interface. If the string is empty or *, the affinities of all NAPI threads remain unchanged and thus, they stay on the house-keeping cores. In all other cases, the string encodes a CPU list, where the CPU IDs refer to the isolated CPUs.
irq_threads_sched string empty, * or [fbroi][0-9]+ The scheduling parameters encoded in a string to be applied to the IRQ threads when their affinity is changed to the isolated cores of the application using the irq_threads_affinity setting. If the string is empty or *, the policy and priority of the IRQ threads remain unchanged. In all other cases, the string encodes a scheduling policy and a real-time priority to be applied to all IRQ threads associated with the isolated network interface. The string starts with a character identifying the scheduling policy (e.g., f for SCHED_FIFO or o for SCHED_OTHER) followed by the real-time priority in the range 0-99.
napi_threads_sched string empty, * or [fbroi][0-9]+ The scheduling parameters encoded in a string to be applied to the NAPI threads when their affinity is changed to the isolated cores of the application using the napi_threads_affinity setting. If the string is empty or *, the policy and priority of the NAPI threads remain unchanged. In all other cases, the string encodes a scheduling policy and a real-time priority to be applied to all NAPI threads associated with the isolated network interface. The string starts with a character identifying the scheduling policy (e.g., f for SCHED_FIFO or o for SCHED_OTHER) followed by the real-time priority in the range 0-99.

The IRQ tuning parameters apply equally to all IRQs (legacy and MSI IRQs) associated with the network interface. Similarly, the NAPI tuning parameters apply equally to all NAPI threads associated with the network interface.

Queue-specific tuning is currently not supported.

All tuning parameters are applied prior to application start and reverted after stopping the application.

NOTICE

Tuning of the network I/O path is only possible if the application requests at the same time one or more isolated cores. Scheduling parameters will not be applied unless the corresponding affinities relocate the kernel threads from the housekeeping cores to the isolated application cores.

The following listing shows an exemplary compose manifest that isolates a network interface as described before and, in addition, claims an isolated core. All affinity values are set to the CPU list 0 in this example, which identifies the first isolated core of the application:

services:
  my_service:
    ...
    x-resources:
      limits:
        siemens.com/isolated_core: 1    
    networks:
      - my_isolated_network

networks:
  my_isolated_network:
    driver: iedge
    driver_opts:
      label: foobar
      prefix: rt
      irq_affinity: "0"
      irq_threads_affinity: "0"
      irq_threads_sched: "f53"
      napi_threads: "true"
      napi_threads_affinity: "0"
      napi_threads_sched: "f54"
    ipam:
      driver: "null"

NOTICE

When requesting multiple isolated cores, the application will only obtain access to the first sibling of each isolated core even on systems with hyperthreading. Thus, the CPU list 1 always refers to the second isolated core.

NOTICE

Application-specific tuning of the network I/O path requires support by the Industrial Edge device for dynamic tuning of kernel threads. The tuning parameters are considered to be mandatory. If the edge device does not offer dynamic tuning support, the application will refuse to start.

Real-time Applications

As mentioned in the section on CPU isolation, executing real-time applications requires several measures to be taken. Some of them are taken care of by the platform, i.e., via device builders or Industrial Edge itself, whereas others are in the responsibility of the app developers. As a guidance for app development, we summarize these measures in the following.

Measure Significance Platform App Developers
Real-time capable Linux kernel required Device builders install kernel with PREEMPT-RT patch. No action required.
Disable hyper-threading recommended Handled by Resource Manager. No action required.
CPU isolation (user space) required Device builders configure a partitioning into housekeeping and isolatable cores and Resource Manager (CPU isolation plugin) allocates them on demand. Specify desired number of isolated cores in Docker Compose file.
CPU isolation (kernel threads) required Device builders move kernel threads to housekeeping cores as far as possible (typically via TuneD). No action required.
CPU isolation (RCU offloading) optional Device builders set rcu_nocbs= on the kernel cmdline (NOTE: this feature also needs to be enabled at compile time). No action required.
CPU isolation (IRQs) required Device builders move interrupts to housekeeping cores (typically via TuneD). No action required.
Scheduling policy and priority required Device builders set the priority of kernel threads (ksoftirq, ktimersoftd, ktimers, rcuc, cpuhp) to SCHED_FIFO 50, i.e., above the application. Set the policy of relevant application threads to SCHED_FIFO and their priority to something below 50. Optionally, tune the scheduling parameters of per-CPU ktimers, ksoftirqd kernel threads, and tune the I/O path of any isolated network interfaces.
Real-time throttling required As a last resort, device builders leave kernel real-time throttling (via /proc/sys/kernel/sched_rt_*_us) active (as default behavior) to protect against starvation. If this kicks in (because the application hogs the CPU), the application will experience latency impact. Ensure that a portion (e.g., 5%) of the CPU time is left to the kernel for housekeeping tasks.
Memory locking recommended None Use mlock to prevent paging.
C-States optional If /dev/cpu_dma_latency is available, the Resource Manager passes it into the container. Set desired C-state via /dev/cpu_dma_latency (if available).

Graphics Processing Units (GPUs)

Industrial Edge apps may utilize GPUs for compute-intensive applications, e.g., rendering, machine learning, or complex simulations. At present, only Nvidia GPUs are supported. Allocating a GPU is straightforward and follows the standard schema.

For example, to request one Nvidia GPU, add the following resource claim to our app manifest:

x-resources:
  limits:
      nvidia.com/gpu: 1

NOTICE

GPU resources currently can only be claimed exclusively. Claims explicitly requesting GPUs for shared usage are rejected by the Resource Manager.

NOTICE

The GPU's device drivers must be installed on the host system and match the kernel version. Ask your device builder for support in case the drivers are not available or unsuited.

If you are using a container provided by Nvidia, all the necessary libraries should be ready to use. When creating a custom container from scratch, be sure to set the following environment variables in the Docker compose file:

ENV CUDADIR /usr/local/cuda
ENV PATH ${CUDADIR}/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:${CUDADIR}/lib64:${LD_LIBRARY_PATH}

Of course, you can install additional libraries, e.g., for machine learning, in your container.

Shared Memory Allocation

An application on the host system can allocate memory through the resource manager, enabling other applications to read from and write to this memory. Resources are matched with applications using labels. As an Industrial Edge administrator, you assign labels to each shared memory resource, and applications filter for these labels in their compose manifests. The application developer must specify an unambiguous label. Additionally, applications can allocate multiple resources. For more information see Shared Memory.

Allocated shared memories can be requested in docker-compose.yml, for example:

services:
  my_service:
    ...
    x-resources:
      definitions:
        sharedmemory1:
          type: siemens.com/shared_memory
          matchExpressions:
            - {key: "labels", operator: "Contains", values: ["label1"]}
          config:
            posix_mount: true
            id: "foo"
        sharedmemory2:
          type: siemens.com/shared_memory
          matchExpressions:
            - {key: "labels", operator: "Contains", values: ["label2"]}
          config:
            posix_mount: false
            id: "bar"
            permissions: ["ro"]
      limits:
        sharedmemory1: 1
        sharedmemory2: 1

NOTICE

In the example, the "label1" and "label2" refer to the strings set in the labels property in the ie-shared-memory-resource-plugin configuration file(s), and causes the resource manager to make a unique selection of the shared memory resource.

  • The option label can be used to filter allocatable shared memory candidates based on the given label.

  • The IEDGE_SHM_MOUNTPOINTS environment variable is a comma-separated list stating the path of the mounted shared memory tmpfs.

IEDGE_SHM_MOUNTPOINTS=/dev/iedge/shm0,/dev/iedge/shm1
  • The config field in the shared memory resource definition supports three fields:

    • posix_mount: This flag can be set to true for only one shared memory resource definition in a service inside the application manifest. If more than one shared memory resource definition has this flag set to true, the application start-up will fail. When the flag is set to true, the /dev/shm mount path is shared with other applications claiming that shared memory resource, thus providing POSIX compliance, similar to Docker's ipc keyword but without System V IPC capability. When this flag is set to false, the /dev/shm mount path is not shared across containers, therefore POSIX compliance is not given. The POSIX mount feature is optional and backward compatible. While this setting might be suitable for certain applications, it is important to note that configuring posix_mount: false may break existing applications that expect the POSIX-compliant mount point to be shared between containers.

    • id: The id fields in the individual config sections of the resource definitions are defined by the application developer for associating mountpoints with their respective ID during container runtime. During runtime, all id fields are joined together using comma-separation which results in an environment variable named IEDGE_SHM_IDS.

    • permissions: The permissions field is an array of permission strings, where each string specifies a capability requested by the application. Currently supported permission strings are "ro" and "rw" for read-only and read-write access respectively. If permissions is not specified, default permission is ["rw"], i.e. the application requests read-write access by default. Empty permissions [] are not allowed. Application developers can restrict the access to read-only which prevents accidental modifications to a shared memory resource. Read-only access with posix_mount: true may prevent POSIX semaphores from working properly. Furthermore, the device operator can specify in the shared memory resource configuration which applications are allowed to obtain which permissions. The operator can use this to prevent users from installing and/or running applications with write access to certain shared memory resources, except for those applications that the operator has explicitly whitelisted.

IEDGE_SHM_IDS=foo,bar
  • To make use of this plugin, an application needs to iterate over the comma-separated lists of IEDGE_* environment variables and associate each individual ID with its respective mountpoint (the first ID maps to the first mountpoint, the second ID to the second mountpoint, etc.). The order of the IEDGE_* environment variables is not deterministic, however the entries are always corresponding with each other, hence allowing a deterministic mapping of IDs to mountpoints. The application then has to create a table that links "shared memory IDs" to "mount points" which helps to identify which shared memory resource, marked with an ID, is mounted at each location. Once this lookup table is constructed, shared memory communication can be implemented by application developers using memory-mapped files within the directory of the respective mountpoint, or POSIX mechanisms if posix_mount: true is set. For more information on this "comma-matching" mechanism, refer to the ptp device support section.
  • The Shared Memory resource class offers functionality not available in default Docker: It allows for the construction of shared-memory pipeline architectures with isolated SHMs. Given a scenario with three applications: App A, B, and C, forming a pipeline such that data flow is A->B->C. With standard Docker IPC mechanism, App A, B, and C need to be in the same IPC namespace, preventing any isolation. With the Industrial Edge Shared Memory functionality, SHM isolation is introduced: App A and App B claim the same SHM and App B and App C claim the same SHM, thus A and C are not visible to each other anymore. Meanwhile, App B needs to build up the aforementioned lookup table using the IDs to deterministically identify which mountpoint is for communication with App A and which is for App C.
  • The shared memory filesystem is a tmpfs with the given maximum size.
  • The size of the tmpfs can be changed during runtime, but only if the shared memory is not in use by any app.
  • The memory allocated for shared memory only exists "virtually." Once accessed by an application, it is allocated and counts towards the memory used by that application. Therefore, the overall RAM limits set for applications via Docker or Cgroups still apply. Shared memory is not "on top" but must be considered by the application developer. Specifically, it is accounted to the application accessing the memory for the first time, meaning applications that write to shared memory must consider this. Thus, shared memory is indirectly limited by the application's memory limits. Additionally, The responsibility for cleanup lies with the app developers; otherwise, files or segments continue to occupy memory until reboot, contributing to the total memory limit specified by the device builder for IE applications. The used shared memory is counted in the mem_limit of the docker-compose setting. The app developer must increase the mem_limit setting by the amount of shared memory the app accesses.
  • The operator can create and delete or resize shared memory filesystem during runtime, but resizing can only occur if the shared memory is not in use.
  • The size of the tmpfs is not reserved or allocated; it is just a maximum limit. Only when applications create a shared memory file/object in the shared memory resource, memory is actually allocated.
  • It is acceptable for the operator to set the size of the shared memory filesystem to a rather large value.
  • Shared memory contents are not preserved across reboot.
  • POSIX message queues are not supported (see man mq_overview) since no IPC namespace sharing takes place
  • SystemV api is also not supported (see man sysvipc) since no IPC namespace sharing takes place
  • App developers should refrain from using Docker-provided shared memory features because:
    • No good app isolation is possible.
    • The operator cannot define/override which apps should communicate with each other.
    • The feature might get disabled in future versions of IE.

GPIO Chip Allocation

An app can request access to General Purpose I/O (GPIO) chips of the Edge device in its docker-compose.yml, for example:

services:
  my_service:
    ...
    x-resources:
      definitions:
        gpio1:
          type: siemens.com/gpio
          matchExpressions:
            - {key: "labels", operator: "Contains", values: ["label1"]}
          config:
            id: "foo"
      limits:
        gpio1: 1

NOTICE

In the example, "label1" references a label set by the device operator in the GPIO resource configuration. This causes the resource manager to make a unique selection of the GPIO resource.

The config field in the GPIO resource definition supports one field:

  • id: The id fields in the individual config sections of the resource definitions are defined by the application developer for associating GPIO resources with their respective ID during container runtime. If omitted, the id will be an empty string. During runtime, all id fields are joined together using comma-separation which results in an environment variable named IEDGE_GPIO_IDS.

Upon successful allocation, the app can access the assigned GPIO chips via their character devices (/dev/gpiochip*, as documented in https://docs.kernel.org/userspace-api/gpio/chardev.html). Note that manipulation of the I/Os via the (deprecated) sysfs ABI is not supported.

The outcome of the allocation is provided to the app via environment variables. Each of the variables is a comma-separated list, with one item per allocated GPIO resource.

Environment Variable Description
IEDGE_GPIO_IDS IDs of the allocated GPIO resources, as set in the config section of the resource definition.
IEDGE_GPIO_DEVICES Paths of the character devices of the allocated GPIO chips.
IEDGE_GPIO_LINE_OFFSETS GPIO line offsets available on the GPIO chips, as space-separated list per allocated chip. The offsets reflect the information obtained from the device driver, and match with the output of gpioinfo.
IEDGE_GPIO_LINE_LABELS GPIO lines labels of the GPIO chips, as space-separated list per allocated chip. The labels reflect the information obtained from the device driver, and match with the output of gpioinfo.

To make use of this plugin, an application needs to iterate over the comma-separated lists of IEDGE_* environment variables and associate each individual ID with its respective character device (the first ID maps to the first device, the second ID to the second device, etc.). The order of the IEDGE_* environment variables is not deterministic, however the entries are always corresponding with each other, hence allowing a deterministic mapping of IDs to devices.

Optional Resources

The Resource Manager supports resource preferences, which means that an app should be started even if there is no instance of the requested resource class available. Such optional resource claims allow for fallback, e.g., from GPU to CPU as supported by many machine learning frameworks such as TensorFlow. There is no need to provide an app in different versions (with or without GPU). Apps can specify optional resources by adding the field optional in the resource definition part of docker-compose.yml.

Examples for the docker-compose.yml file:

# Isolation of CPU core(s)
x-resources:
      definitions:
        my_cpu:
          type: siemens.com/isolated_core
          optional: true
      limits:
        my_cpu: 1
# Nvidia GPUs
x-resources:
      definitions:
        my_gpu:
          type: nvidia.com/gpu
          optional: true
      limits:
        my_gpu: 1

Developers can claim both mandatory and optional resources in one app. The semantics is defined as follows, where n denotes the number of available resources, m the number of claimed mandatory resources, and o the number of claimed optional resources:

  • If n >= m+o, the app will get m+o resouces.
  • If m <= n < m+o, the app will get m resources and the optional resource claim is skipped.
  • If n < m, the app will fail to start with an error indicating that there are not enough available resources.
# Isolation of both mandatory and optional resources
x-resources:
      definitions:
        my_cpu:
          type: siemens.com/isolated_core
          optional: true
      limits:
        my_cpu: 1
        siemens.com/isolated_core: 2

For network interfaces, the optional keyword is part of the driver_opts section:

# Network interfaces
services:
  my_service:
    ...
    networks:
      - my_isolated_network

networks:
  my_isolated_network:
    driver: iedge
    driver_opts:
      label: foobar
      prefix: rt
      optional: "true"
    ipam:
      driver: "null"

NOTICE

The default value is false if no optional field is specified. Unlike for ordinary resource classes like CPUs and GPUs, the type of optional in network resource definitions is String, i.e., it must be optional: "true" or optional: "false". The reason is that Docker Compose does not allow Boolean values in the driver_opt section.