Vectorization exploits parallelism at granularity level of machine instruction. A vector processor has some extra computing units and fast memory. For example addition of two N-element vectors on an ordinary scalar processor will execute one instruction at a time.
DO 10 I = 1,N
A(I) = B(I) + C(I)
10 CONTINUE
But on a vector processor all N instructions can be executed on N separate processor which reduces the total time to slightly more than N times that needed to execute a single addition. Vectorizing compiler takes the sequential statements equivalent to the vector statement and performs a translation into vector machine instruction. The condition that allows vectorization is that the elements of the source operands must be independent of the result operands. For example:
DO 20 I = 1,N
X(I) = B(I) + A(I) * X(I-1)
20 CONTINUE
in this example X(I-1) must be calculated before X(I).This loop carried dependence makes it impossible to use vectorisation. But in some cases it is possible to modify code in such a way that vectorization is possible. for example:
DO 100 J = 1,N
DO 100 I = 1,N
DO 100 K = 1,N
C(I,J) = C(I,J) + A(I,K) * B(K,J)
100 CONTINUE
In this matrix multiplication example at each iteration C(I,J) is calculated using previous value of C(I,J) calculated in previous iteration so vectorization is not possible. But if code is written as:
DO 100 J = 1,N
DO 100 K = 1,N
DO 100 I = 1,N
C(I,J) = C(I,J) + A(I,K) * B(K,J)
100 CONTINUE
In this case vectorization is possible because consecutive instructions calculate C(I-1,J) and C(I,J) which are independent of each other and can be executed concurrently on different processors.
Thus dependency analysis at instruction level can help us to recognize operand level dependencies and apply appropriate optimization to allow vectorization.