Classroom practice: Dynamic arrays, part II.

Gábor Horváth / Zsolt Kohári · 2020.11.20.

Dynamic memory allocation. Dynamic arrays. Data type for storing sets.

Dynamic arrays, part II.

Preparation for the practice:

The lecture notes on arrays should be reviewed.
The lecture notes on pointers and strings should be reviewed.
The lecture notes on dynamic memory management should be reviewed.

1. Where is the mistake?

The code snippets contain at least one (or potentially more) errors. Find them! Explain what the problem is! What makes the codes wrong? What is the principal problem with them? How to fix them?

int fun(void)[100] {
    int array[100];
    return array;
}

Solution

Nice try, but it does not work. First of all, it is syntactically wrong: arrays can neither be received nor be returned by functions like this. In case we would want to return an array created in the function, we have to allocate it dynamically, and return the pointer pointing to it.

struct DynArray {
    int n;
    int array[n];
};

Solution

In the C language the size of a struct must always be fixed, and known at compile time, the compiler must know its size even before the program is executed. This is one of the reasons that make the above solution wrong: changing member variable n in the structure won't resize the array. The other issue is related to the scopes: member variable n will never exist in itself as a stand-alone variable in the code, only if we create an instance from structure DynArray. Then, something.n is a valid variable, but without creating an instance it is not possible to use n (for instance, to set the array size to it).

Fix: the array should not be put into the structure, it should be created externally. The structure should store a pointer to it only:

struct DynArray {
    int n;
    int *array;
};

Then, the array must be allocated dynamically:

struct DynArray da1;
da1.n = 100;
da1.array = (int*) malloc(sizeof(int) * 100);

struct DynArray {
    int n;
    int *array = (int*) malloc(sizeof(int) * n);
};

Solution

Total nonsense, in C language no executable statements can be put into the structures. How should it be executed, when variable n does not even have a value yet? It can be fixed as shown in the previous example.

2. Data type for storing sets

Write a program that implements various operations on a set consisting of real numbers. The number of elements in the set can be arbitrary large. The following operations must be implemented (by a function for each):

It must be possible to check whether an element is in the set or not
It must be possible to insert a new element into the set (if it is already there, nothing should happen)
It must be possible to remove an element from the set

Draw a figure about the memory layout of the set data structure!

Theoretically, it is not recommended to compare real numbers using the == operator. For simplicity, let us now ignore this issue and focus on the dynamic memory operations instead!

Solution

The requirement "the number of elements in the set can be arbitrary large" indicates that we have to rely on the dynamic memory management. The elements of the set can be stored in a dynamically growing array. A set is characterized by two properties: the pointer pointing to this array and the number of elements in the array. These two data must be managed together all the time, hence it makes sense to enclose them in a structure:

typedef struct Set {
    int len;
    double *data;
} Set;

When an instance of the set is created, both the len and the data member variables contain memory garbage. Consequently, the set must be properly initialized before the first usage. To avoid memory leaks, the memory allocated for the array must be released when the set is not used any more. Therefore, apart from the above listed three functions we need to implement two more: one for initializing and one for destroying a set.

Many functions operating of the set modify the member variables of the structure. For instance, the data array may get reallocated, the length of the array can grow, etc. For this reason, we will pass the pointer to the set structure to these functions, instead of passing the structure itself. This is only correct solution in case of the insert and the remove functions, but for convenience (to make the call interface of the functions uniform) we are going to implement all functions like this. In this case we do not have to remember the exact parameter list of each function, since they are the same. Those functions, that dont modify the set (e.g., the one that checks if a given element is included) will receive a pointer pointing to a constant structure, so that the compiler can check whether we wrote a wrong code in the function that modifies the set by mistake.

Inserting a new number to the set or removing one from it is a costly operation. Since arrays can not be resized, the a new array must be created every time the arrray is modified, and the old elements must be copied from the old place to the new one. This copy operation makes the implementation inefficient. After the new array is fully filled with the updated content, the old one must be released and the pointer in the structure pointing to the data array must be updated to point to the new array. (The realloc() function does basically the same)

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

typedef struct Set {
    int len;
    double *data;
} Set;

/* receives an uninitialized set and makes a valid empty set from it */
void set_init(Set *s) {
    s->len = 0;
    s->data = NULL;
}

/* releases the set when it is not needed anymore */
void set_destroy(Set *s) {
    free(s->data);
}

/* returns true is the element is in the set */
bool set_contains(Set const *s, double what) {
    for (int i = 0; i < s->len; ++i)
        if (s->data[i] == what)
            return true;  /* found in the set */
    return false;         /* not found */
}

void set_insert(Set *s, double what) {
    /* do nothing if already in the set */
    if (set_contains(s, what))
        return;

    /* copy elements to a bigger by one array  */
    double *newarr = (double*) malloc((s->len + 1) * sizeof(double));
    for (int i = 0; i < s->len; ++i)
        newarr[i] = s->data[i];

    newarr[s->len] = what;  /* put the new element to the end */
    free(s->data);    /* the old array is not needed any more */
    s->data = newarr;     /* the pointer points to the new array from now on */
    s->len++;          /* number of elements has increased by one */
}

void set_remove(Set *s, double what) {
    /* do nothing if not in the set */
    if (!set_contains(s, what))
        return;

    /* new array, number of elements is less by one */
    double *newarr = (double*) malloc((s->len - 1) * sizeof(double));
    int j = 0;
    for (int i = 0; i < s->len; ++i)
        if (s->data[i] != what)
            newarr[j++] = s->data[i];
    free(s->data);
    s->data = newarr;
    s->len--;         /* number of elements has decreased by one */
}

/* lists the content of the set */
void set_print(Set const *s) {
    for (int i = 0; i < s->len; ++i)
        printf("%g ", s->data[i]);
    printf("\n");
}

int main(void) {
    Set s;

    set_init(&s);
    set_insert(&s, 3.14);
    set_insert(&s, 2.0);
    set_insert(&s, 3.14);
    set_print(&s);
    set_insert(&s, 6.1);
    set_print(&s);
    set_remove(&s, 3.14);
    set_print(&s);
    set_destroy(&s);

    return 0;
}

In the solution above we have exploited that free(NULL) is accepted by the standard. For some older C compilers this can be a problem, though. In this case, writing if (ptr!=NULL) free(ptr); solves the problem.

After the main() function has put number 6.1 to the set, the memory layout is as follows:

Set s is in the stack, consisting of the length and the pointer, and the pointer points to a dynamically allocated array, given by the malloc() call.

3. Set operations with dynamically allocated structures

Let us modify the previous solution such that the initializer function returns a new, dynamically created set, instead of initializing an existing one. After this modification using the set should be as simple as follows:

Set *s;

s = set_new();
set_insert(s, 3.14);
set_insert(s, 6.1);
set_print(s);
set_remove(s, 3.14);
set_print(s);
set_destroy(s);

This way the sets can be passed freely among the functions, they wont be deleted since they are not local variables any more. Observe how benefitial the modification is to the syntax: since s is always a pointer, we dont need to use the address-of operator & any more.

Solution

This approach leads to the following memory layout

The local variable in the main() is only a pointer, both the set structure and the data array are located in the heap, since they were allocated dynamically.

The difference is only in the initializing and the destroying functions. When initializing, the structure has to be created by malloc(), too, consequently, when destroying, the structure has to be released, too:

Set *set_init() {
    Set *ps;
    ps = (Set*) malloc(sizeof(Set));
    ps->data = NULL;
    ps->len = 0;
    return ps;
}

void set_destroy(Set *ps) {
    free(ps->data);
    free(ps);
}