What is SIMD VEctorization
A vector may be a direction book operand holding An situated of information components stuffed under a one-dimensional exhibit. The components camwood a chance to be basic alternately floating-point qualities. Practically Vector/SIMD media development Also SPU educational work on vector operands. Vectors need aid also known as SIMD operands alternately stuffed operands.What is Auto Vectorization?
Programmed vectorization, clinched alongside parallel computing, may be an extraordinary the event about programmed parallelization, the place a workstation project will be changed over from An scalar implementation, which methods a single match of operands In a time, will a vector implementation, which methods person operation looking into numerous pairs from claiming operands without a moment's delay.So the general purpose of this post is to show to how to implement SIMD vectorization and autovectorization in C code and understanding it by breaking down the code using assembly language and will be compiling on GCC compiler to know its capabilities which is integral part. Here we will be creating a short program with two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000, then sums those two arrays element-by-element to a third array, and finally sums the third array and prints the result. So, our first step would be to create such program.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
int array1[1000];
int array2[1000];
long array3[1000];
long arraySum = 0;
srand(time(NULL));
for (int i = 0; i < 1000; i++) {
array1[i] = (rand()% 2001) - 1000;
array2[i] = (rand()% 2001) - 1000;
array3[i] = array1[i] + array2[i];
arraySum += array3[i];
}
printf("The total array sum is: %li\n", arraySum);
return 0;
}
Compiling this code through gcc compiler it using command
//gcc -O3 -fopt-info-vec-missed=vect_v0.miss vect_v0.c -o vect_v0 gives us something like this
vector.c:14:1: note: not vectorized: loop contains function calls or data references that cannot be analyzed
vector.c:12:1: note: not vectorized: not enough data-refs in basic block.
vector.c:16:22: note: not vectorized: not enough data-refs in basic block.
vector.c:14:1: note: not vectorized: not enough data-refs in basic block.
vector.c:20:9: note: not vectorized: not enough data-refs in basic block.
But if we make few changes in our code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
int array1[1000];
int array2[1000];
long array3[1000];
long arraySum = 0;
srand(time(NULL));
for (int i = 0; i < 1000; i++) {
array1[i] = (rand()% 2001) - 1000;
array2[i] = (rand()% 2001) - 1000;
}
for (int i = 0; i < 1000; i++) {
array3[i] = array1[i] + array2[i];
arraySum += array3[i];
}
printf("The total array sum is: %li\n", arraySum);
return 0;
}
after compililation with the command //gcc -O3 -fopt-info-vec-missed=vect_v0.miss vect_v0.c -o vect_v0 note: loop vectorized
Here we can see that our loop got vectorized.
Now disassembling the code which is autovectorized using simd objdump -d nameof yourfile
0000000000400560 <main>:
// Here we reserve space on the stack for local variables
400560: d283f010 mov x16, #0x1f80 // #8064
400564: cb3063ff sub sp, sp, x16
400568: d2800000 mov x0, #0x0 // #0
40056c: a9007bfd stp x29, x30, [sp]
400570: 910003fd mov x29, sp
400574: a90153f3 stp x19, x20, [sp, #16]
400578: 529a9c74 mov w20, #0xd4e3 // #54499
40057c: a9025bf5 stp x21, x22, [sp, #32]
400580: 72a83014 movk w20, #0x4180, lsl #16
400584: f9001bf7 str x23, [sp, #48]
400588: 910103b5 add x21, x29, #0x40
40058c: 913f83b6 add x22, x29, #0xfe0
400590: 5280fa33 mov w19, #0x7d1 // #2001
400594: d2800017 mov x23, #0x0 // #0
400598: 97ffffd6 bl 4004f0 <time@plt>
40059c: 97ffffe9 bl 400540 <srand@plt>
4005a0: 97ffffdc bl 400510 <rand@plt>
4005a4: 9b347c01 smull x1, w0, w20
4005a8: 9369fc21 asr x1, x1, #41
4005ac: 4b807c21 sub w1, w1, w0, asr #31
4005b0: 1b138020 msub w0, w1, w19, w0
4005b4: 510fa000 sub w0, w0, #0x3e8
4005b8: b8376aa0 str w0, [x21, x23]
4005bc: 97ffffd5 bl 400510 <rand@plt>
4005c0: 9b347c01 smull x1, w0, w20
4005c4: 9369fc21 asr x1, x1, #41
4005c8: 4b807c21 sub w1, w1, w0, asr #31
4005cc: 1b138020 msub w0, w1, w19, w0
4005d0: 510fa000 sub w0, w0, #0x3e8
4005d4: b8376ac0 str w0, [x22, x23]
4005d8: 910012f7 add x23, x23, #0x4
4005dc: f13e82ff cmp x23, #0xfa0
4005e0: 54fffe01 b.ne 4005a0 <main+0x40> // b.any
4005e4: 4f000401 movi v1.4s, #0x0
4005e8: d2800000 mov x0, #0x0 // #0
4005ec: 3ce06ac0 ldr q0, [x22, x0]
4005f0: 3ce06aa2 ldr q2, [x21, x0]
4005f4: 91004000 add x0, x0, #0x10
4005f8: f13e801f cmp x0, #0xfa0
// This is what it's all for: vector addition
4005fc: 4ea28400 add v0.4s, v0.4s, v2.4s
400600: 0ea01021 saddw v1.2d, v1.2d, v0.2s
400604: 4ea01021 saddw2 v1.2d, v1.2d, v0.4s
400608: 54ffff21 b.ne 4005ec <main+0x8c> // b.any
40060c: 5ef1b821 addp d1, v1.2d
400610: 90000000 adrp x0, 400000 <_init-0x4b8>
400614: 91200000 add x0, x0, #0x800
// Move the first and second 64-bit elements from vector 1 to two separate registers
// This might be so that they can be used as arguments for printf?
400618: 4e083c21 mov x1, v1.d[0]
40061c: 97ffffcd bl 400550 <printf@plt>
400620: f9401bf7 ldr x23, [sp, #48]
400624: a94153f3 ldp x19, x20, [sp, #16]
400628: 52800000 mov w0, #0x0 // #0
40062c: a9425bf5 ldp x21, x22, [sp, #32]
400630: d283f010 mov x16, #0x1f80 // #8064
400634: a9407bfd ldp x29, x30, [sp]
400638: 8b3063ff add sp, sp, x16
40063c: d65f03c0 ret
In spite of the fact that gcc’s auto-vectorization could build raise execution it might not be useful for certain provisions. But auto vectorization can't be trusted. There are huge numbers confinements states will think about auto-vectorization. Gcc needs affirmation that arrays would adjusted Furthermore information may be adjusted. Also, code will well on the way must make re-written should rearrange circle purpose.
No comments:
Post a Comment