ahh.cl is a collection of utilities for for OpenCL programming. It extends pyopencl with various additional conveniences, by which I mean the pyopencl classes are directly modified:
>>> import ahh.cl, pyopencl
>>> ahh.cl.Platform is pyopencl.Platform
True
ahh.cl can function as a drop-in replacement for pyopencl. All your programs should continue to work – none of the original functionality is discarded. Only the extended functionality is documented below. Refer to the pyopencl API documentation for additional capabilities.
A presentation (PDF) I gave introducing OpenCL and GPU programming, and talking about ahh.cl and ahh.cl.oquence is useful as an introduction. For more details about OpenCL, see the documentation at the OpenCL website.
If you would like the source materials for the presentation, it is available in Keynote and Powerpoint format as well. Creative Commons licensed, but email me before using it please!
Here is an ahh.cl version of the example provided on the pyopencl website:
import numpy
import numpy.linalg as la
from ahh.cl import Context
## Create an OpenCL context
ctx = Context.for_device(0,0)
## Initialize memory
a = numpy.random.rand(50000).astype(numpy.float32)
b = numpy.random.rand(50000).astype(numpy.float32)
c = numpy.empty_like(a)
## Compile a program
prg = ctx.compile('''
__kernel void sum(__global const float *a,
__global const float *b,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
''')
## Run a program (implicitly copying memory in or out, see also InOut)
prg.sum(a.shape, ctx.In(a), ctx.In(b), ctx.Out(c)).wait()
## Verify the results
print la.norm(c - (a+b))
See ahh.cl.oquence for an even more concise and readable version. If the In/Out stuff rubs you wrong, a version without those conveniences is gone through in detail in the presentation as well.
Bases: pyopencl._cl.Platform
A Platform encompasses the compilers and other tools which implement the OpenCL spec. A platform can support one or more devices.
For example, on Snow Leopard, the Apple platform supports OpenCL. On other systems, NVidia, AMD and IBM provide platforms.
If idx is
Raises an Error otherwise.
Returns a Platform instance somehow.
Returns a platform index somehow. See get_somehow().
Bases: pyopencl._cl.Device
Represents a single OpenCL-capable device.
Prints a numbered list of available devices.
Returns the device with the provided index in the provided platform.
platform can be a platform index or Platform instance.
Returns a Device instance somehow.
Returns a tuple (device index, Platform instance) somehow.
Bases: pyopencl._cl.Context
Represents the OpenCL context from which all commands are issued.
Note
Though a Context can be bound to multiple devices, all features will not be available. Use multiple Contexts instead.
Creates an instance of Context bound to one device, somehow.
See Device.get_somehow().
Creates an instance of Context for the specified device and platform.
platform can be a Platform instance or a platform index.
Provides a default CommandQueue for this context.
If context is bound to more than one device, raises an :class:`Error`.
Returns the device bound to this Context.
If multiple devices were specified, raises an :class:`Error`.
Allocates an uninitialized Buffer with provided metadata.
Note
This isn’t named malloc to emphasize that the allocated buffer maintains type information, unlike malloc in C.
Copies the src buffer to the dest buffer.
Supports host-device, device-host and device-device copies.
wait_for can be specified as a list of Event instances to block on before copying.
If block is True, does not return until transfer completes. Otherwise, returns immediately. Returns an Event instance.
If queue is not provided, uses the default queue.
Copies the src host buffer to a new Buffer, inferring metadata from src.
Copies the src device buffer to a new numpy array synchronously.
If not explicitly specified, the shape, dtype and order of the resulting array are inferred from the src if possible, or like if not. The default order is “C”.
wait_for can be specified as a list of Event instances to block on before copying. Always blocks.
If queue is not specified, uses the default queue.
Bases: Boost.Python.instance
Bases: pyopencl._cl.Buffer
Represents a buffer in global memory.
Creates a Buffer without saving any metadata (pyopencl default.)
Creates a Buffer with shape, element type and order metadata.
It is highly recommended that you use the methods available in Context to create buffers rather than doing so explicitly.
Infers a shape from the provided Python object.
Attempts to infer a cl_dtype for the source buffer.
If a cl_dtype attribute is defined, uses that. If not, but a dtype is defined, uses to_cl_type to look it up.
Raises an Error if not able to infer a cl_dtype.
Bases: pyopencl._cl.LocalMemory
Represents an allocation in local memory.
Creates a LocalMemory instance with shape, type and order metadata.
See Buffer.shaped() for an explanation of the arguments.
Bases: pyopencl._cl.Program
Represents an OpenCL program.
Bases: pyopencl._cl.Kernel
Represents an OpenCL kernel.
Extended to save program, name, context and the default queue as attributes.
Extended to use the default queue if the first argument is not a CommandQueue and the shape of the first kernel argument as the global_size if not explicitly specified.
Also provides support for Context.Out() and Context.InOut().
Python ints and floats are converted to numpy ints and floats automatically. The default floating point data type is float, not double, which is only used if the number cannot fit into the range of the float. The default integer data type is int, with long being used if the number is out of range of int.
Bases: pyopencl.Error
Base class for errors in ahh.cl.
Does NOT replace pyopencl.Error, but does extend it.
The process-wide default context to use. None until set explicitly.
Bases: object
An OpenCL extension descriptor.
Note
The list of extensions defined below is not comprehensive. Feel free to add additional ones that you know about.
Standard 64-bit floating point extension.
See section 9.3 in the spec.
Standard extension supporting use of the half type as a full type.
See section 9.10 in the spec.
Standard 32-bit base atomic operations for global memory.
See section 9.5 in the spec.
Standard 32-bit extended atomic operations for global memory.
See section 9.5 in the spec.
Standard 32-bit base atomic operations for local memory.
See section 9.6 in the spec.
Standard 32-bit extended atomic operations for local memory.
See section 9.6 in the spec.
Tuple containing both the base and extended 32-bit base atomic extensions.
Tuple containing both the base and extended 32-bit base atomic extensions.
Tuple containing all 32-bit atomics extensions.
Standard 64-bit base atomic operations.
See section 9.7 in the spec.
Standard 64-bit extended atomic operations.
See section 9.7 in the spec.
Tuple containing all 64-bit atomics extensions.
Standard extension to support byte addressable arrays.
See section 9.9 in the spec.
Standard extension to support 3D image memory objects.
See section 9.8 in the spec.
Apple extension for OpenGL sharing.
Apple SetMemObjectDestructor extension.
Apple ContextLoggingFunctions extension.
Tuple containing all Apple extensions.
Bases: object
Base class for descriptors for OpenCL data types.
Do not initialize this or any subclasses directly – singletons have already been defined below.
The name of the type.
The minimum size, in bytes, of this type, independent of device.
The maximum size, in bytes, of this type, independent of device.
The GlobalPtrType corresponding to this type.
The LocalPtrType corresponding to this type.
The ConstantPtrType corresponding to this type.
The PrivatePtrType corresponding to this type.
Bases: ahh.cl.Type
Base class for descriptors for OpenCL builtin types.
Bases: ahh.cl.BuiltinType
Base class for descriptors for OpenCL scalar types.
Calling a type descriptor will produce an appropriate numpy scalar suitable for calling into a kernel with:
>>> cl_int(10).__class__
<type 'numpy.int32'>
A string representing the unqualified name of the numpy dtype corresponding to this scalar type, or None if unsupported by numpy.
The numpy dtype corresponding to this scalar type.
None if unsupported by numpy.
The minimum value this type can take.
None if device-dependent.
The maximum value this type can take.
None if device-dependent.
Converts a bare literal into an appropriately typed literal.
Adds a suffix, if one exists. If not, uses a cast.
The suffix appended to literals for this type, or None.
(e.g. ‘f’ for float)
Note that either case can normally be used. The lowercase version is provided here.
Raw integer and floating point literals default to int and double, respectively, unless the integer exceeds the bounds for 32-bit integers in which case it is promoted to a long.
Bases: ahh.cl.ScalarType
Base class for descriptors for OpenCL scalar integer types.
A boolean indicating whether this is an unsigned integer type.
If integer, this provides the signed variant of the type.
If integer, this provides the unsigned variant of the type.
Bases: ahh.cl.ScalarType
Base class for descriptors for OpenCL scalar float types.
A map from numpy.dtype descriptors to ScalarType descriptors.
8-bit signed integer type.
Short name for cl_char.
Note
These short names are non-standard.
8-bit unsigned integer type.
Short name for cl_uchar.
16-bit signed integer type.
Short name for cl_short.
16-bit unsigned integer type.
Short name for cl_ushort.
32-bit signed integer type.
Short name for cl_int.
32-bit unsigned integer type.
Short name for cl_uint.
64-bit signed integer type.
Short name for cl_long.
64-bit unsigned integer type.
Short name for cl_ulong.
16-bit floating point type.
See the spec if you intend to use this, its complicated.
Short name for cl_half.
32-bit floating point type.
Short name for cl_float.
64-bit floating point type.
Short name for cl_double.
cl_bool is cl_int
The void type.
Can only be used as the return type of a function or the target type of a pointer.
Signed-integer type with size equal to Device.address_bits.
Short name for cl_intptr_t.
Unsigned integer type with size equal to Device.address_bits.
Short name for cl_uintptr_t.
Signed integer type large enough to hold the result of subtracting pointers.
Short name for cl_ptrdiff_t.
Unsigned integer type large enough to hold the maximum length of a buffer.
Short name for cl_size_t.
Bases: ahh.cl.Type
Base class for descriptors for OpenCL pointer types.
The address space the pointer refers to, e.g. “__global”.
The short name of the address space, e.g. “global”.
Bases: ahh.cl.PtrType
Base class for descriptors for OpenCL pointers to global memory.
Bases: ahh.cl.PtrType
Base class for descriptors for OpenCL pointers to local memory.
Bases: ahh.cl.PtrType
Base class for descriptors for OpenCL pointers to constant memory.
Bases: ahh.cl.PtrType
Base class for descriptors for OpenCL pointers to private memory.
Produces an OpenCL numeric literal from a number-like value heuristically.
See source for full algorithm.
>>> to_cl_numeric_literal(4)
"4"
>>> to_cl_numeric_literal(4.0)
"4.0f"
>>> to_cl_numeric_literal(4, report_type=True)
(<ahh.cl.Type <int>>, "4")
>>> to_cl_numeric_literal(4.0, report_type=True)
(<ahh.cl.Type <float>>, "4.0f")
>>> to_cl_numeric_literal(2**50, report_type=True)
(<ahh.cl.Type <long>>, "1125899906842624L")
>>> to_cl_numeric_literal(2**50, unsigned=True, report_type=True)
(<ahh.cl.Type <ulong>>, "1125899906842624uL")
>>> to_cl_numeric_literal(cl_double.max, report_type=True)
(<ahh.cl.Type <double>>, "1.79769313486e+308")
Non-numeric values will throw AssertionErrors.
See also: ScalarType.make_literal() to specify the type explicitly.
Produces a Type descriptor from a number-like value heuristically.
See examples in to_cl_numeric_literal(), which calls this function.
Bases: object
A stub for built-in functions avaiable to OpenCL kernels.
The name of the function.
A function which, when provided the types of the input arguments, gives you the return type of the builtin function, or raises an Error if the types are invalid.
If not None, returns a tuple of extensions required for arguments of the specified types.
Bases: object
A descriptor for builtin constants available to OpenCL kernels.
Bases: object
A descriptor for OpenCL reserved keywords.
A map from built-in and reserved names to their corresponding descriptor.
The get_work_dim builtin function.
The get_global_size builtin function.
The get_global_id builtin function.
The get_local_size builtin function.
The get_local_id builtin function.
The get_num_groups builtin function.
The get_group_id builtin function.
The abs builtin function.
The abs_diff builtin function.
The add_sat builtin function.
The hadd builtin function.
The rhadd builtin function.
The clz builtin function.
The mad_hi builtin function.
The mad24 builtin function.
The mad_sat builtin function.
The max builtin function.
The min builtin function.
The mul_hi builtin function.
The mul24 builtin function.
The rotate builtin function.
The sub_sat builtin function.
The upsample builtin function.
The clamp builtin function.
The degrees builtin function.
The mix builtin function.
The radians builtin function.
The step builtin function.
The smoothstep builtin function.
The sign builtin function.
The acos builtin function.
The acosh builtin function.
The acospi builtin function.
The asin builtin function.
The asinh builtin function.
The asinpi builtin function.
The atan builtin function.
The atan2 builtin function.
The atanh builtin function.
The atanpi builtin function.
The atan2pi builtin function.
The cbrt builtin function.
The ceil builtin function.
The copysign builtin function.
The cos builtin function.
enqueue_barrier( (CommandQueue)arg1) -> None :
- C++ signature :
- void enqueue_barrier(pyopencl::command_queue {lvalue})
The half_cos builtin function.
The native_cos builtin function.
The cosh builtin function.
The cospi builtin function.
The half_divide builtin function.
The native_divide builtin function.
The erfc builtin function.
The erf builtin function.
The exp builtin function.
The half_exp builtin function.
The native_exp builtin function.
The exp2 builtin function.
The half_exp2 builtin function.
The native_exp2 builtin function.
The exp10 builtin function.
The half_exp10 builtin function.
The native_exp10 builtin function.
The expm1 builtin function.
The fabs builtin function.
The fdim builtin function.
The floor builtin function.
The fma builtin function.
The fmax builtin function.
The fmin builtin function.
The fmod builtin function.
The fract builtin function.
The frexp builtin function.
The hypot builtin function.
The ilogb builtin function.
The ldexp builtin function.
The lgamma builtin function.
The lgamma_r builtin function.
The log builtin function.
The half_log builtin function.
The native_log builtin function.
The log2 builtin function.
The half_log2 builtin function.
The native_log2 builtin function.
The log10 builtin function.
The half_log10 builtin function.
The native_log10 builtin function.
The log1p builtin function.
The logb builtin function.
The mad builtin function.
The modf builtin function.
The nextafter builtin function.
The pow builtin function.
The pown builtin function.
The powr builtin function.
The half_powr builtin function.
The native_powr builtin function.
The half_recip builtin function.
The native_recip builtin function.
The remainder builtin function.
The remquo builtin function.
The rint builtin function.
The rootn builtin function.
The round builtin function.
The rsqrt builtin function.
The native_rsqrt builtin function.
The half_rsqrt builtin function.
The sin builtin function.
The native_sin builtin function.
The half_sin builtin function.
The sincos builtin function.
The sinh builtin function.
The sinpi builtin function.
The sqrt builtin function.
The half_sqrt builtin function.
The native_sqrt builtin function.
The tan builtin function.
The half_tan builtin function.
The native_tan builtin function.
The tanh builtin function.
The tanpi builtin function.
The tgamma builtin function.
The trunc builtin function.
The dot builtin function.
The distance builtin function.
The length builtin function.
The normalize builtin function.
The fast_distance builtin function.
The fast_length builtin function.
The fast_normalize builtin function.
The isequal builtin function.
The isnotequal builtin function.
The isgreater builtin function.
The isgreaterequal builtin function.
The isless builtin function.
The islessequal builtin function.
The islessgreater builtin function.
The isfinite builtin function.
The isinf builtin function.
The isnan builtin function.
The isnormal builtin function.
The isordered builtin function.
The isunordered builtin function.
The signbit builtin function.
The any builtin function.
The all builtin function.
The bitselect builtin function.
The select builtin function.
The atom_add builtin function.
The atom_sub builtin function.
The atom_xchg builtin function.
The atom_inc builtin function.
The atom_dec builtin function.
The atom_cmpxchg builtin function.
The atom_min builtin function.
The atom_max builtin function.
The atom_and builtin function.
The atom_or builtin function.
The atom_xor builtin function.
The vload_half builtin function.
The vstore_half builtin function.
The sizeof builtin operator.