Loading...

Generative AI for coding: tales from the field

7 months ago

Generative AI

Quasar is an AI company in the broad sense of the term. At Quasar, in a very abstract and simplified way, we think it’s possible to predict the future by analyzing the past, and we help our customers do that by doing numerical analysis of timeseries data.

This differs from Generative AI, a system capable of generating text, images, and anything you fancy from prompts.

In this article, I will talk about Generative AI and my experiences with it to help me write software. Although it’s a different field, Generative AI might eventually be an input or output of systems like Quasar, and we need to understand what’s going on in this field.

Since we don’t have a dog in this fight, hopefully, you’ll find this article unbiased.

I will start with a platitude: the possibilities created by the interactions between a generator and human imagination will yield massive productivity gains we can’t even fathom.

In scorn of nature, art gave lifeless life

Here is an example of an image I generated; I wanted the painting of a computer scientist in the style of the famous painter Sandro Botticelli:

AI generated computer scientist in the style of Sandro Botticelli

At first glance, this looks fantastic, an excellent imitation of the style (as far as I can tell).

Until you realize the hands (look at the fingers), the computer (why two keyboards?), and the position doesn’t work.

First, hands are one of the most challenging things to paint because the human mind is uniquely trained to identify anything abnormal with hands, and thus the execution must be perfect.

Second, the problem is that the generator does not know what it is generating. It can connect my text with its vast knowledge base, but its ability is purely mechanical. It doesn’t understand how a human would use a computer. There’s no why. It attempts to connect my prompt with a knowledge base statistically.

To be clear, I’m not referencing the Chinese Room experiment here, as the generators can’t even give the illusion of intelligence.

Had we demanded a strong beginner painter to do similar work, probably the finishes wouldn’t have been as good. Still, the result would have been much more coherent because a human would understand that we don’t have six fingers and don’t need two keyboards for a single computer.

If I give a more generic input, such as “draw me a car”, I obtain this:

An AI generated car

For some reason, it rendered a toy car on the ground. I’m not sure I can make sense of the roof.

Don’t get me wrong, I’m fascinated by the possibility of the tools, but they still require a lot of supervision. We’re light years away from the Star Trek holodeck: “computer, render a saloon”.

AI to write code

Maybe asking our current Generative AI to rival our artists is a tall order.

What about code generation? There’s a mathematical limit to what a computer will ever be able to do in terms of code generation and analysis due to something called the halting problem. However, intuitively, we can imagine AI could get very good at generating correct programs.

In this article, we will ask Bard and ChatGPT 4 to solve a software engineering problem to which I already have an answer and see how it does.

The code will be in C++, but you don’t need to be able to understand C++ to read and understand the discussions with the AIs.

Delta compression

One of our distinctive features at Quasar is our compression suite (Delta4C) which places us number one in terms of compression speed and ratio, things that matter a lot when you have large datasets. This is not me bragging (ok, maybe a little bit). I am using this part as we probably have a great reference point for everything related to compression code.

I won’t ask Bard or ChatGPT to rewrite our compression suite, just one component: delta compression. The principle of delta compression is that if you are working on data that doesn’t vary a lot, storing the differences is more efficient than storing the actual data, as the changes are smaller.

For example, if you are storing 1, 2, 4, 5, 8, storing 1, 1, 2, 1, 3 will be more compact as the lowest the number, the lower the number of bits required to encode them.

How efficiently can AI implement delta compression in C++?

The Quasar implementation

Here is the code in Quasar, feel free to reuse it as you wish, and it’s given here as an example with no guarantee of any kind. This code is pasture-raised, certified humane, with the right amount of protein to ensure you get those made gains at the gym:

void compute_deltas_no_simd(const std::uint32_t * in, std::uint32_t * out, std::uint64_t l) noexcept
{
    if (!l) return;

    out[0] = in[0];

    for (std::uint64_t i = 1; i < l; ++i)
    {
        // Compute the difference between consecutive elements
        out[i] = in[i] - in[i - 1];
    }
}

#ifdef BOOST_SIMD_ASSUME_SSE4_2
void compute_deltas(const std::uint32_t * in, std::uint32_t * out, std::uint64_t l) noexcept
{
    if (l <= 4u)
    {
        compute_deltas_no_simd(in, out, l);
        return;
    }

    std::uint64_t i = 0;

    auto prev = _mm_set1_epi32(0);

    for (; i < l / 4; ++i)
    {
        auto curr  = _mm_lddqu_si128(reinterpret_cast<const __m128i *>(in) + i);
        auto delta = _mm_sub_epi32(curr, _mm_alignr_epi8(curr, prev, 12));
        _mm_storeu_si128(reinterpret_cast<__m128i *>(out) + i, delta);
        prev = curr;
    }

    // handle remainder
    for (i = 4 * i; i < l; ++i)
    {
        out[i] = in[i] - in[i - 1];
    }
}

There’s an optimization when the pointers are aligned on 128-bit boundaries that does non-temporal loads that I left out as irrelevant to our topic.

Any median-skill C++ software engineer can write the non-SIMD version in less than 15 minutes. However, the SIMD version starts to dabble in black magic, necessitating a Pr. Falcou seal of approval.

Let’s see what an AI can do.

Bard

Bard is Google’s attempt at catching up with OpenAI.

The discussion

I have an array of std::uint32_t accessed through a pointer, and I would like to compute the delta of each value compared to its predecessor and store that into another array of std::uint32_t also accessed through a pointer. Assume the destination is allocated correctly. For example, for the input {1, 2, 3, 4}, I expect the output {1, 1, 1, 1}.

Bard answer:

void compute_deltas(const std::uint32_t* source, std::uint32_t* destination, size_t size) {
  for (size_t i = 1; i < size; i++) {
    destination[i] = source[i] - source[i - 1];
  }
}

This answer isn’t correct. I thus ask Bard: “What happens with your code if the input has a size of 1?”

To which Bard answers:

void compute_deltas(const std::uint32_t* source, std::uint32_t* destination, size_t size) {
  for (size_t i = 1; i < size; i++) {
    if (i > 0) {
      destination[i] = source[i] - source[i - 1];
    }
  }
}

Still not correct. I tell Bard: “Then the result is incorrect; for an input of size 1, the output should be that element.”

void compute_deltas(const std::uint32_t* source, std::uint32_t* destination, size_t size) {
  for (size_t i = 0; i < size; i++) {
    if (i > 0) {
      destination[i] = source[i] - source[i - 1];
    } else {
      destination[i] = source[i];
    }
  }
}

Code is super inefficient with a condition inside the loop. I ask Bard to move it out: “Can you move the condition out of the loop? The condition will prevent optimizations.”

void compute_deltas(const std::uint32_t* source, std::uint32_t* destination, size_t size) {
  bool is_first_element = true;
  for (size_t i = 0; i < size; i++) {
    if (is_first_element) {
      is_first_element = false;
      destination[i] = source[i];
    } else {
      destination[i] = source[i] - source[i - 1];
    }
  }
}

At this point, I give up and give Bard the answer so we can move on. Bard tells me my code is correct. Well, thank you Bard, but are you just saying that to be nice? Just an FYI Bard, I’m going to crank up the difficulty: “Can you write a version of the code I just gave you that leverages SSE 4.2 using the Intel intrinsics?”

void compute_deltas_sse42(const std::uint32_t * in, std::uint32_t * out, std::uint64_t l) noexcept
{
    // Check if the input array is empty.
    if (!l) return;

    // Load the first element of the input array into a vector.
    __m128i v_in = _mm_loadu_si128((const __m128i *)in);

    // Initialize the output vector.
    __m128i v_out = _mm_set1_epi32(0);

    // Loop over the rest of the input array.
    for (std::uint64_t i = 1; i < l; i++)
    {
        // Load the next element of the input array into a vector.
        __m128i v_next_in = _mm_loadu_si128((const __m128i *)(in + i));

        // Compute the difference between the two vectors.
        __m128i v_delta = _mm_sub_epi32(v_next_in, v_in);

        // Store the difference vector into the output array.
        _mm_storeu_si128((__m128i *)out + i, v_delta);
    }
}

The answer is super false, or to be more accurate, false². I ask Bard if it can see what’s wrong. The answer?

Yes, I see what’s wrong in my answer. The code that I wrote does not handle inputs with a size of 1 correctly. The output for an input of size 1 should be that element, but the code that I wrote will output 0.

I tell Bard: “There’s another problem, when you load with _mm_loadu_si128, you’re loading 128-bit, but at each iteration, you advance 32-bit”

Then Bard gives me another incorrect answer:

void compute_deltas_sse42(const std::uint32_t * in, std::uint32_t * out, std::uint64_t l) noexcept
{
    // Check if the input array is empty.
    if (!l) return;

    // Load the first element of the input array into a vector.
    __m128i v_in = _mm_loadu_si128((const __m128i *)in);

    // Initialize the output vector.
    __m128i v_out = _mm_set1_epi32(0);

    // Loop over the rest of the input array.
    for (std::uint64_t i = 1; i < l; i += 4)
    {
        // Load the next 4 elements of the input array into a vector.
        __m128i v_next_in = _mm_loadu_si128((const __m128i *)(in + i));

        // Compute the difference between the two vectors.
        __m128i v_delta = _mm_sub_epi32(v_next_in, v_in);

        // Store the difference vector into the output array.
        _mm_storeu_si128((__m128i *)out + i, v_delta);
    }
}

I’ll save you time, Bard could not come up with the correct answer. Vectorization is hard, even for humans. I’m not mad at Bard.

Rating Bard

  • The first proposed solution didn’t correctly handle arrays of size 1
  • The proposed fix was not efficient
  • Bard was incapable of providing a remotely working SSE 4.2 implementation

On this test, Bard behaved as a below-median C++ software engineer. It’s, however, still a great tool as it can send you in the right direction. The problem is Bard doesn’t understand what it’s doing. It’s just mapping your words into a knowledge base and trying to give a statistically probable answer. It also doesn’t ask any questions.

ChatGPT is better but has the same flaws.

ChatGPT 4

ChatGPT is OpenAI’s prompt that took the internet by storm and fueled headlines about we were all going to be jobless, poor, mocked by our friends, abandoned by our families, and ridiculed by strangers in the street.

The discussion

I have an array of std::uint32_t accessed through a pointer, and I would like to compute the delta of each value compared to its predecessor and store that into another array of std::uint32_t also accessed through a pointer. Assume the destination is allocated correctly. For example, for the input {1, 2, 3, 4}, I expect the output {1, 1, 1, 1}.

ChatGPT answer:

/* Given that you have an input array of std::uint32_t and want to compute the difference between each value and its predecessor, you can perform this in C++ using pointer arithmetic. Here is an example function that does this: */

#include <cstdint>

void computeDeltas(std::uint32_t* input, std::uint32_t* output, std::size_t size) {
    if (size == 0) return;
    
    // Handle the first element
    *output = *input;
    ++output;
    ++input;
    
    // Handle remaining elements
    for (std::size_t i = 1; i < size; ++i, ++input, ++output) {
        *output = *input - *(input - 1);
    }
}
/*
In this function, the input and output parameters are pointers to the first elements of the input and output arrays, respectively, and size is the number of elements in the arrays.

The function starts by checking if the size of the array is zero. If it isn't, it assigns the first element of the input array to the first element of the output array. It then increments both pointers to move to the next elements in the arrays.

The function then enters a loop that computes the difference between each element in the input array and its predecessor, storing the result in the output array. It does this by dereferencing the input pointer, subtracting the value of the element before it (which it gets by subtracting 1 from the input pointer), and assigning the result to the element of the output array that the output pointer points to. It then increments both pointers to move to the next elements in the arrays.

This function assumes that the output array has been properly allocated and that it has the same size as the input array. */

My response: “Great answer. Can you give me the SSE4.2 version of it?”

#include <cstdint>
#include <emmintrin.h>
#include <smmintrin.h>

void computeDeltas(std::uint32_t* input, std::uint32_t* output, std::size_t size) {
    if (size == 0) return;
    
    // Handle the first element
    *output = *input;
    ++output;
    ++input;
    
    // Handle remaining elements
    for (std::size_t i = 1; i < size; i += 4, input += 4, output += 4) {
        __m128i current = _mm_loadu_si128((__m128i*)input);
        __m128i previous = _mm_loadu_si128((__m128i*)(input - 1));
        
        // Shift and duplicate last element of current
        __m128i current_shifted = _mm_shuffle_epi32(current, _MM_SHUFFLE(0, 3, 2, 1));
        // Put first element of current to the last of current_shifted
        current_shifted = _mm_blend_epi16(current_shifted, current, 0x11);

        __m128i deltas = _mm_sub_epi32(current, current_shifted);

        _mm_storeu_si128((__m128i*)output, deltas);
    }
}

My response: “You need to process the remaining elements.”

#include <cstdint>
#include <emmintrin.h>
#include <smmintrin.h>

void computeDeltas(std::uint32_t* input, std::uint32_t* output, std::size_t size) {
    if (size == 0) return;
    
    // Handle the first element
    *output = *input;
    ++output;
    ++input;
    
    // Handle remaining elements in chunks of 4
    std::size_t i = 1;
    for (; i + 4 <= size; i += 4, input += 4, output += 4) {
        __m128i current = _mm_loadu_si128((__m128i*)input);
        __m128i previous = _mm_loadu_si128((__m128i*)(input - 1));
        
        // Shift and duplicate last element of current
        __m128i current_shifted = _mm_shuffle_epi32(current, _MM_SHUFFLE(0, 3, 2, 1));
        // Put first element of current to the last of current_shifted
        current_shifted = _mm_blend_epi16(current_shifted, current, 0x11);

        __m128i deltas = _mm_sub_epi32(current, current_shifted);

        _mm_storeu_si128((__m128i*)output, deltas);
    }

    // Handle remaining elements individually
    for (; i < size; ++i, ++input, ++output) {
        *output = *input - *(input - 1);
    }
}

My answer: “At the first iteration, you are loading from an invalid memory address because you are doing i-1”

ChatGPT answer:

#include <cstdint>
#include <emmintrin.h>
#include <smmintrin.h>

void computeDeltas(std::uint32_t* input, std::uint32_t* output, std::size_t size) {
    if (size == 0) return;
    
    // Handle the first element
    *output = *input;
    ++output;
    ++input;
    
    // Handle the first chunk of four elements
    if (size >= 5) {
        __m128i current = _mm_loadu_si128((__m128i*)input);
        __m128i current_shifted = _mm_shuffle_epi32(current, _MM_SHUFFLE(0, 3, 2, 1));
        current_shifted = _mm_blend_epi16(current_shifted, current, 0x11);
        __m128i deltas = _mm_sub_epi32(current, current_shifted);
        _mm_storeu_si128((__m128i*)output, deltas);
        input += 4;
        output += 4;
    }
    
    // Handle remaining elements in chunks of 4
    for (std::size_t i = 5; i + 4 <= size; i += 4, input += 4, output += 4) {
        __m128i current = _mm_loadu_si128((__m128i*)input);
        __m128i previous = _mm_loadu_si128((__m128i*)(input - 1));
        __m128i current_shifted = _mm_shuffle_epi32(current, _MM_SHUFFLE(0, 3, 2, 1));
        current_shifted = _mm_blend_epi16(current_shifted, current, 0x11);
        __m128i deltas = _mm_sub_epi32(current, current_shifted);
        _mm_storeu_si128((__m128i*)output, deltas);
    }

    // Handle remaining elements individually
    for (std::size_t i = size - (size % 4); i < size; ++i, ++input, ++output) {
        *output = *input - *(input - 1);
    }
}

I’ll save you some reading: despite many attempts to nudge ChatGPT in the right direction, it fails at giving me a proper answer, sometimes cycling between two incorrect answers.

Rating ChatGPT-4

  • The first proposed solution was correct and efficient, although it could have been cleaner.
  • ChatGPT was incapable of providing a correct SSE 4.2 implementation.

What is annoying with ChatGPT is that the answers are VERY VERBOSE. I can’t tell you how many times I’ve rolled my eyes at ChatGPT writing me a novel about how I must take its answer very carefully because, you know, as an AI, it may, sometimes, be a possibility that eventually some of the things could maybe perhaps be incorrect. Where is the “no disclaimer” option?

Also, ChatGPT gives me a lot of context and details that are, quite frankly, not very useful. I think Bard is better at providing replies, it’s more concise and easier to consume, and it gives me an example.

That being said, ChatGPT-4 behaved as a median C++ software engineer on this test, which is impressive. However, as soon as the complexity of the task increases, the magic stops, and it behaves like an idiot savant, randomly applying corrections.

ChatGPT-3.5 was also evaluated and behaved worse than Bard on this test.

Beyond Delta Compression: a review of Generative AI for coding

As you can imagine, I’ve tested generators with many more questions than delta compression, mostly with ChatGPT-4 as it’s the best generator.

Here is my review:

  • If you ask canonical questions with well-known answers, ChatGPT does excellent. I suspect it’s because it trained on Stack Overflow and used the ratings to measure good responses.
  • ChatGPT is also good at “looking into the documentation for you.” For example, you can ask ChatGPT about API examples or tools usage, and the answer is often excellent.
  • All generators fail miserably for anything requiring adding one or two levels of abstraction. A good example is vectorizing a program, which involves understanding and translating the problem into a new paradigm. This requires understanding the problem. Another example is transforming a SSE 4.2 code into AVX2, whose instructions don’t map perfectly.
  • The answer given is never the best, but it is often a good answer and generally at least a median answer. That’s very useful, but you should always be critical of the result. For example, if ChatGPT gives a different answer than what you are currently doing, this doesn’t mean what you’re doing isn’t good.
  • We’re encountering the same problem we had when we generated images. The only difference is that a 90% correct program is as good as a 0% correct one, sometimes even worse.
  • Generally speaking, ChatGPT is a great tool to do exploration and discovery as it’s a much, much better search interface than Google.

My main beef with these tools is that they don’t ask questions. They would get one order of magnitude better if they asked questions instead of answering my prompt immediately.

A temporary conclusion about tools under construction

These AI tools are unique productivity tools but can’t replace software engineers and probably will make bad software engineers even worse because they won’t be able to see if the answers are incorrect.

Not unlike the generated pictures, answers appear great at first glance, but significant flaws appear when you pay attention.

AI will yield immense productivity gains in use cases with fewer possible answers. Artists, programmers, and creators, in the broadest sense of the term, will be able to use generators wherever minutia is needed.

Regarding programming, I think there would be tremendous value in having tools trained on your code base: find errors, answer questions, and give advice. It’s also a great tool to have a first draft answer to a problem you don’t necessarily have inspiration for.

However, the amount of supervision required to use the output is extremely high. If anything, AI raised the bar for the skill level you need to be a top percentile performer.

We must never forget that the real challenge of software engineering isn’t writing the code but knowing what kind of code should be written. It’s not the what or the how. It’s the why.

AI is light years from being able to help us with this.

If you’d like to learn how Quasar can help you be successful with your digital transformation, check out our products.


Top