Learn mainframe SIMD instructions for the IBM z13's processor

SIMD instructions in the z13 vector extension facility accelerate processing of social media and big data workloads.

IBM's z13 processor increases computing capacity in high-performance mainframe servers with bigger cache, simultaneous multithreading, large page frames, instruction pipeline management and single instruction multiple data.

The z13 is intended to bring mobile computing to the mainframe. And the processor is a return to single instruction multiple data (SIMD) instructions with the z13 vector extension facility.

The SIMD vector instructions accelerate processing for languages like C and Java. Vector instructions perform operations on multiple data elements in parallel, enabling the mainframe to quickly process large amounts of data. This is a boon to social media and big data workloads, but probably not much use to the average systems programmer writing a job accounting exit.

SIMD instructions increase throughput in multiple ways. Unlike most machine instructions where the results overlay one of the input operands, most of the SIMD instruction sets work on two input registers and store the results in a third. This means programming techniques spend less time juggling registers.

IBM mainframe

Vector registers are 128 bytes long. The first 16 registers actually coexist with the 64 bit floating point registers (FPRs). Changing an FPR will also wreck all the bytes of the corresponding vector register. There are some squirrely rules about preserving vector registers across program calls, which are explained in IBM's Assembler Services Guide.

The SIMD vector instructions include all the math functions in integer and floating point mode. There are also string operations, as well as methods for getting data to and from storage.

A vector register's contents are made up of elements of one, two, four, eight or 16 bytes. Masks in the vector instructions specify the size of the elements to be manipulated. All vector instruction mnemonics begin with V, although IBM also provides some extended mnemonics for specifying element sizes, which is documented in chapters 21 through 24 of the z13 Principles of Operation manual.

How SIMD instructions work

The manual outlines some additional nuances, but these simpler vector instructions show how an SIMD instruction set works on z13.

The load vector register command looks familiar:

            VL  V1, D2(X2,B2)

Where V1 is the vector register, D2 is the displacement and X2 and B2 are the index and base registers.

But, since a vector register's contents consist of elements, there are instructions for dealing with them individually. An example is the Vector Load Element instruction, which updates one element:

            VLEx V1,D2(X2,B2)M3

Here the x specifies the size of the element, B for byte, H for half word (16 bits), F for full word and G for double word. The V1, D2, X2 and B2 operands play familiar roles, but the M3 mask field specifies the index of the field to be updated. Thus the instruction VLEH V1,HALFWORD,3 updates the fourth half word of vector register 1, leaving all the other elements unchanged.

Along with regular loads, the SIMD vector load instruction sets include ways to generate masks, insert elements from general registers and pack elements from one vector to the register. This is not a familiar packed decimal -- it's the ability to halve elements and squeeze them into another register.

Vector register examples

Assuming we've loaded up two vector registers with eight half-word integers each, we could add all the elements together with one vector add instruction:

            VA  V1,V2,V3,M4

In this case, the processor adds the signed half-word elements in V2 and V3 and stores them in V1, which demonstrates the non-destructive nature of an SIMD instruction. The mask, M4, specifies the element size. The mask value should equal one for half words. The processor carries overflow into the integer's sign bit, which can make the math a little tricky.

The vector instructions support string functions as well. While complicated with several options, SIMD string functions become a little easier if you think of them as hardware implementations of C string handling functions. For instance, here's the Vector Find Element Equal instruction:

            VFEE V1,V2,V3,M4[,M5]

At a high level, this instruction compares the elements in V2 and V3 and sets flags in V1 accordingly. Mask M4 denotes the element sizes while M5 specifies two things; setting bit 2 tells the processor to compare the elements of V2 against V3 and zero. When bit 4 equals one, the processor will set the condition code.

At any rate, the instruction compares the elements of the second and third operands from left to right. When it finds equal elements, it sets the byte index of that element in byte seven of the first operand. If no elements are equal, the seventh byte of the first operand will contain a byte index equal to the number of elements in the register. The same thing will happen if the instruction finds all the elements are zero.

Next Steps

CICS TS 5.2 targets mobile on the mainframe

How to get a discount for mobile workloads

Dig Deeper on IBM system z and mainframe systems