[after Ray spills a box of toothpicks on the floor]
Raymond: 82, 82, 82.
Charlie: 82 what?
Raymond: Toothpicks.
Charlie: There’s a lot more than 82 toothpicks, Ray.
Raymond: 246 total.
Charlie: How many?

Sally Dibbs: 250.
Charlie: Pretty close.
Sally Dibbs: There’s four left in the box.

The idea behind this famous Rain Man toothpick scene actually comes from the true story “The Twins” from the fascinating book “The man who mistook his wife for a hat” by the British neurologist and author Oliver Sacks. In that story, a box with 111 matches falls on the floor, after which the twins immediately notice that what they´re seeing in from of them is three times the prime number 37. Sacks confirms the number by counting the matches by hand. In the movie, the directors probably wanted to leave out the counting from the scene, and solved this by leaving a few toothpicks stuck in the unused box. Unfortunately, they missed the fact that the twins were primarily interested in prime numbers, which clearly 82 and 246 are not.

For all those software and hardware developers that can´t count multiples of prime numbers in a stack of matches at the blink of the eye, there are models, which operatie at higher levels of abstraction. Software and hardware these days is incredibly complex. Even small systems consist of millions of lines of code. In my pocket, I typically carry about 30 million lines of code, by means of an Android phone. On my desk there´s a PC, which roughly includes 50 million lines of code as part of Windows, and another 50 million lines of code by means of an Ubuntu installation that runs under a virtual machine. That´s just the base configuration. I´m not counting any of the applications that run on top of this machine. I´m not counting all the code that´s just a split second away from my fingers, by means of running a Google search.

But complexity doesn´t neccesarily scale with the number of lines of code. It can be incredibly difficult to find a bug in a snippet of code that´s just about 20 lines. Now image trying to find a bug in 100 million+ lines of code. Or being given the assignment to parallelize an application that consists of 1 million lines of code.

One remedy to control such huge complexity is to split such a system up in modules, and to keep the code within the modules as simple, readable, and understandable as possible. Another remedy is for those sections of the design that allow it, to raise them to a higher level of abstraction. The nice things about hardware and software is that these  high-level models can be automatically translated to an implementation. Since this translation is fully automated and follows rules that are proven to be correct, the implementation is in addition also often guaranteed to be accurate to the model.

Using a model, one can briefly describe what is otherwise too longwinded or complex when stated using a programming language. This keeps the complexity manageable for the designer. As the complexity of systems increases, the use of models also should increase. We are after all not Rain Man, who always sees the trees through the forest.

Vector Fabrics adheres to this approach, and there are different areas where they apply modeling techniques. The Vector Fabrics tools help you optimize your C/C++ code for multicore.

They first analyze the source code and run-time behaviour of an application, then present its behaviour in an easy-to-understand and browse around GUI. You could say they extract a model from the source code and present it to the user.

This is part of gaining insight into your application, the first step of the optimization process. During the next step, the investigation phase, the tool helps you figure out where it´s best to add parallelism: where is it possible, where do you actually increase performance, and understand what may be blocking efficient parallelization. During this phase, they run a model of the parallelized application on top of a model of the actual multicore target hardware. This way they give you an in-depth understanding of the resulting parallel performance, without having to make any code changes. This allows you to look before you leap, preventing you from wasting precious time on flawed parallelization strategies.

This article originally ran in the Bits & Chips magazine.