Resizing Arrays means lots of memory usage
Matz has done a lot of magic in MRI to make arays fast. Rubinius and JRuby do even more, but fast is of course relative in Ruby.
Say you have an Array with 10,000 space ships. These are player ships, and they are structs. I will give you an example definition, but all that you need to consider, is that this a complex object. It's size can radically change from one element to the next.
Ship = Struct.new(:player, :type, :forward_weapons, :aft_weapons, ...)
A new player has just joined the game. That of course means we need to
add the new ship to the array of ships. So we shovel
<< a new ship
onto the array. No sweat! But wait, there is more!
When you mutated that array, Ruby, under the covers, had to create a new array of a larger size and stuff the two arrays into a new array. Now Ruby does a lot of tricky things like use pointers to split and join arrays at the machine level, but there is still a lot of copying going on.
Copying means memory. Lots of memory if your list is large. And who doesn't want to tackle Big Data?
Enter the Linked List
I know, I know. Ruby core has no concept of a Linked List, and Matz knows his C, so he obviously knew about Linked Lists, and he chose to leave them out. Why should we care about lists?
I am not going to make the argument that Ruby needs a Linked List. I will examine what a Linked List would look like in Ruby, how it would perform, and what problems it could help us solve.
Return to that giant array. Now instead of an array, let's insert a linked list instead. Here is a basic example of such a structure:
Node = Struct.new(:object, :next) class LinkedList def push(object) # adds an item to the end of the list end ... end
You can see the implementation of this at the github page
The big difference between the Array and the LinkedList is that the List knows something about the next item in the list (and in a doubly linked, the previous node) while an array is accessed via an index.
When you add a new item to a linked list, you are simply creating a new node, and then pointing the old tail.next to the new node, then setting the new tail for the list to that new node.
list = LinkedList.new a_node = list.push "a" b_node = list.push "b" a_node.next #=> b_node list.head #=> a_node list.tail #=> b_node c_node = list.push "c" list.head = a_node list.tail = c_node a_node.next #=> b_node a_node.next.next #=> c_node
When dealing with a linked list, you walk the list:
current = list.head while current.next != nil # this will run until there is a nil @ next! current = current.next end
If this were a doubly linked list, it would point backwards as well, so you could traverse from head or tail. You might even be able to see how a list would be useful in common sorting algorithms.
More memory efficient?
After looking into this question over the last week, the answer is no. Remember how I said Matz (and co) does clever tricks with C pointers? Well, Master Yoda has schooled me. Benchmarks count more then elloquent waxing (which I highly doubt I achieve), and thus:
BIG THANKS to my friend Phil Cohen for helping me find flaws in my methods of benchmarking this.
In conclusion I guess it was a little silly to think something written in Ruby could compare with something written in C. So this means to really test a linked list to an array is to implement this as a C extension.
So stay tuned. I will shout my success or failure from a roof top near you.