Processors

OpenCL™ (Open Computing Language)

OpenCL is a framework for expressing programs where parallel computation is dispatched across heterogeneous devices. It is an open, royalty-free standard maintained by the technology consortium Khronos Group

On a heterogeneous device, OpenCL views one of the programmable cores as a host and the other cores as OpenCL devices. For example, on a Sitara™ AM572x SoC, the host is the Arm® Cortex®-A15 cluster running SMP/Linux or TI-RTOS and the OpenCL device is the C6xx DSP cluster. 

The OpenCL runtime consists of two components:

  1. An API for the host program to create and submit kernels for execution
  2. A cross-platform language for expressing kernels – OpenCL C – which is based on C99 C with some additions and restrictions

OpenCL supports both data parallel and task parallel programming paradigms. 

  • Data parallel execution parallelizes the execution across compute units on a device. 
  • Task parallel execution enables asynchronous dispatch of tasks to each compute unit.

For more detailed information on TI’s OpenCL implementation and supported devices, see TI’s OpenCL User’s Guide.

Benefits of using OpenCL

Using a standard approach to programming heterogeneous SoCs simplifies programming; it allows the programmer to use standard, well-documented APIs to handle the mechanics of dispatching code and data to the DSPs and focus on optimizing the dispatched code. Other benefits include:

  • Seamless migration of applications between TI SoCs (e.g., take an OpenCL application written for a 66AK2H SoC with eight C66x DSP cores and run it on an AM572x SoC with two C66x DSP cores with only a recompile).
  • TI extensions to OpenCL enable programmers to leverage optimized TI-provided accelerated DSP libraries such as DSPLIB, MATHLIB, and IMGLIB
  • Use the DSPs to offload computation within open source libraries such as Linear algebra and OpenCV.

Key features of TI’s OpenCL implementation

  • OpenCL host is the Arm® Cortex®-A15 cluster running Linux
  • One OpenCL device with the set of C66x DSP cores available on the device
  • Compute unit is a single C66x DSP
  • TI-specific extensions to OpenCL improve performance of code offloaded to the C66x DSP
  • OpenCL implementation conformant to v1.1 (full profile) on AM57x and 66AK2H devices

Notes: TI’s OpenCL implementation does not provide support for images; image support is optional in the OpenCL v1.1 specification. Support for the double precision floating-point data type is enabled as an OpenCL extension and is not included in the conformance testing.

Included examples

TI’s OpenCL implementation includes many examples illustrating various aspects of the implementation, including TI specific extensions to OpenCL and optimizing OpenCL kernels for the C66x DSP. A detailed description of the examples can be found in the examples section of the OpenCL User’s Guide

Measuring OpenCL overheads for Arm® – DSP communication

The null example reports the time overhead that OpenCL requires to submit and dispatch a kernel. A null (empty) kernel is created and dispatched so that the OpenCL profiling times queried from the OpenCL events reflects only the OpenCL overhead necessary to submit and execute the kernel on the device. This overhead is for the round-trip for a single kernel dispatch. In practice, when multiple kernels are being enqueued, the dispatch overhead is overlapped with kernel execution and can approach zero.

Download OpenCL

OpenCL is available on the following devices and is included as part of TI’s free Processor SDK that can be downloaded using the links below:

Sitara™ AM57x

  • Single/dual Cortex®-A15 and C66x DSPs

C6000 DSP + Arm® 66AK2Ex

  • Highest performance dual/quad Cortex®-A15

C6000 DSP + Arm® 66AK2Hx

  • High perf. multicore C66x DSPs + Arm® Cortex®-A15s

C6000 DSP + Arm® 66AK2Gx

  • Single Cortex®-A15 and single C66x DSP

Note: OpenCL is conformant to v1.1 (full profile) on AM57x and 66AK2H devices. Complete set of conformance tests have not been run on 66AK2Ex and 66AK2Gx devices.