Previously I have discussed the Seven Planets Riddle and, gosh, that solving program runs hilariously slowly! One million combinations was processed in more than a minute even on a 2.8GHz Core i7 with 8 processes. It may take hours or even days to complete. I stated back then that I would try to use the GPU. And even though at the moment I acknowledge that I failed miserably, I have discovered some other tricks that have made the program run tremendously fast even on one CPU core! Let’s discuss that today.
Police routes. A quicker way.
Previously, we have come up with an algorithm that would generate all the possible routes for the police ship. Essentially, it was a list of all possible combinations of 11 numbers from 0 to 6. And I dwelled whether there was a way to do it with itertools.
Well, yes, there is a way, and it’s pretty simple.
itertools.product takes several iterables and creates all possible combinations of their elements. So we can simply provide eleven
range(7) objects to yield all posiible combinations of eleven numbers from 0 to 6. But as we stated previously, we need only the routes which start on planets 0, 1 and 2. So I made this generator yield 10 elements instead and I just add the first number at the front.
Just in time came the just-in-time
It’s all fine and dandy, but if you run our previous code, it was ridiculously slow.
So I used a just-in-time compilation provided by
numba package. When you run a function decorated by
@jit, it is compiled into something faster than an interpreted Python code; and the following executions of the same function are blazingly fast!
So, let me show you the final code of my program and we’ll discuss it.
The functions that generate the routes are the same. Look at the
main() function now. First, we generate a list of one million combinations, as before. Then we convert it into a
numpy array. The reason is, they actually work faster in a compiled code. I will probably make an article with benchmarks someday. This approach is a bit memory-inefficient, but will suffice for now. Additionally, I could make the type
numpy.int8 to preserve memory, since the numbers in our array cannot exceed 6 anyway.
And finally, I pass this array to the analyzing function
test_po. Its previous version did not have that outer
for loop. Also, I have implemented a more C-like approach: I don’t iterate over the data structures directly but rather get the elements by indices. It seems that in this particular case it works faster, even though some of my other experiments have shown that the two approaches are equal in speed on jit. I will conduct more experiments and write a blog post if I find something interesting.
test_po function is now decorated with
@jit. So, I’ve run this code and was totally smashed. A one-million long iteration that took over a minute to complete now takes just slightly over a second! The performance on this stage has increased sixty- or seventy-fold! So the whole program took a bit over 24 minutes in just one process, not eight! Now that’s a boost!
There was, of course, an overhead on generation of police routes; I could make the program even faster. But for now it will suffice; the aim of this article was to introduce you to jit.
You may ask, if we could get such an enourmous boost with only one core, can we utilize all eight now? Well, yes, we can. But it will require a significant reworking of our code, so I will leave it for the next article, where we will introduce some better coding practices as well.
So stay tuned! And as always, thanks for reading, and I will see you eventually!