Lab 5: My Vector, aka Dynamic Array ADT (5 Points)

Chris Tralie

Overview / Logistics

The purpose of this lab is to give you practice designing collections, and to explore what goes on under the hood of popular data structures such as the Java Arraylist, the C++ vector, or the Python list.

You can obtain the starter code by typing

git clone https://github.com/ursinus-cs174-s2022/Lab5_MyVector.git

I have provided a makefile, and the entry point for running the code with a main is in driver.cpp. I have also provided a file tester.cpp with a main, which you can use to test your code rigorously when it's finished.

Learning Objectives

  • Manipulate public and private variables/methods in classes to accomplish information hiding
  • Gain experience using information hiding and object references to implement a user-friendly collection class.
  • Maintain an underlying array to implement a flexible ArrayList-like data structure

Background

The vector container class in the C++ standard template library (STL) provides a more flexible alternative to the traditional array. It still has the "random access" property like regular arrays, so we can jump instantly to particular indices. But it is much more flexible for adding and deleting elements than a traditional array. Recall, for instance, if we wanted to add an element to the end of a regular array, we would have to make an entirely new array, copy everything over that was in there before, and then finally put in the new element:

By contrast, if we have an vector, all we have to do is say

And, as we will see in a moment, adding things to the end of a vector can be done in "constant amortized time"; that is, on average, it only takes some small number of operations independent of how many elements there are in the array. This contrasts to the addElement code above, which always takes N operations if there are N elements in the array. In this way, the vector has all of the flexible benefits of a linked list, while also maintaining the random access property of an ordinary array.

Finally, the vector class has the advantage that it is an object, not just a pointer like a regular array. This means we can implement safeguards like bounds checking in its access methods.

Flexible resizing algorithm

Behind the scenes, a C++ vector actually has an array of generic items, which we call the "underlying array." This private array is resized and maintained to accommodate the elements that are added to /removed from it.

One possible implementation of a vector would create a new array every time an element is added to the end, but this is very inefficient. A better strategy is actually to keep an array that's longer than we need, and only to resize it when we run out of room. When it runs out of room, we can double its size. This ensures that the average "time complexity" (amount of time it takes per task) of adding an element to the end of the list is still a constant number of steps on average. Or in more CS language, we have a "constant amortized cost for an add operation." For example, let's say we start with an underlying array of size 10 and we add 100 elements to it. We have to resize at the following places:

  • At 10, copying over 10 elements
  • At 20, copying over 20 elements
  • At 40, copying over 40 elements
  • At 80, copying over 80 elements
Otherwise, it's a simple assignment to an array index which takes constant time. So the total number of steps is (100 + 10 + 20 + 40 + 80), or an average of 2.875 operations per add task (and this will never be more than 3 with the doubling scheme).

By the same token, if a lot of items are removed, it's wasteful to keep too large of an array in memory. So if the list is holding fewer than 1/3 the capacity of the underlying array after you remove an element, then you should halve the underlying array.

Bearing this in mind, below are the private variables of the myvector class

There are actually three member variables per object: one for the underlying array, one for its size, and another for storing how many elements have actually been added to it (since the underlying array is usually larger than the number of elements in the vector). By default, the underlying array starts off with a capacity of DEFAULT_CAPACITY, but it is possible to initialize it with an overloaded constructor where this initial capacity is a parameter:

Programming Tasks

Your job will be to fill in methods to add and remove items from the data structure, while maintaining the underlying array. You should not use any methods from the built-in C++ vector class. Rather, you will be making your own implementation from scratch that has the same performance characteristics of C++'s version.

First, study the provided code in myvector.cpp and myvector.h. A number of methods have been provided to you already, and numerous hints have been provided in the comments. Below are the methods you should implement. It is recommended that you implement them in the order specified below.

Operations To Implement

  • (1 Point) void doubleCapacity()

    Double the capacity of the underlying array by creating a new underlying array and copying everything over that was there before up to N. You may want to refer to halveCapacity() as a reference.

    NOTE: doubleCapacity and halveCapacity are both private methods, since someone using the class need not know what's going on behind the scenes.

    NOTE ALSO: Be sure to avoid memory leaks from the old underlying array!

  • (1 Point) void push_back(Item item)

    Add an item to the end of the list and update its size. The default implementation throws an exception if the underlying array runs out of space, but you should call your doubleCapacity() method to make space if needed so this doesn't happen.

  • (1 Point) void insert(int index, Item item)

    This is an overloaded version of the add method in which an item is added at a specific index in the array. You should move the item that was at that index before, and everything to the right of it, over by one index to accommodate this new item. You should double the capacity of the underlying array if necessary.

  • (1 Point) Item remove(int index): Remove an item at a particular index and return it. You should also move everything to the right of this index over one to the left and update the size N. Also, be mindful of the following things:
    • If the new number of elements being stored is small enough, you should halve the capacity (using the provided halveCapacity() method).
    • Be sure to throw an out of bounds error if the index to remove is out of bounds (look at the at method for an example of this)
  • (1 Point) Item removeItem(Item item)

    Remove the first occurrence of a particular item in the list if it is present and returns it. You may want to use the remove method as a helper method.

Testing

Play around with some simple tests in the driver.cpp file. Once you feel your code is working well, test it out rigorously with the file tester.cpp, which compares your method implementations to an actual C++ vector object on a random set of many of these operations. This program takes two command line arguments: the number of operations, and a seed. Like the taps/initial state in a linear feedback shift register, the seed will lead to a repeatable, pseudorandom sequence. So if you get a bug for a particular number of operations and a particular seed, you will get that same bug when running it again with the same parameters.

As an example, if your program is working properly and you run

/tester 10000 0

Then the last 5 operations should be