Good evening!

Previously I have discussed the Seven Planets Riddle and, gosh, that solving program runs hilariously slowly! One million combinations was processed in more than a minute even on a 2.8GHz Core i7 with 8 processes. It may take hours or even days to complete. I stated back then that I would try to use the GPU. And even though at the moment I acknowledge that I failed miserably, I have discovered some other tricks that have made the program run tremendously fast even on one CPU core! Let’s discuss that today.

Police routes. A quicker way.

Previously, we have come up with an algorithm that would generate all the possible routes for the police ship. Essentially, it was a list of all possible combinations of 11 numbers from 0 to 6. And I dwelled whether there was a way to do it with itertools.

Well, yes, there is a way, and it’s pretty simple.

import itertools


def police_gen():
	for i in range(3):
		for j in itertools.product(*((range(PLANETS_N),)*(MAXLEN-1))):
			yield (i,) + j

pgen = police_gen()

The itertools.product takes several iterables and creates all possible combinations of their elements. So we can simply provide eleven range(7) objects to yield all posiible combinations of eleven numbers from 0 to 6. But as we stated previously, we need only the routes which start on planets 0, 1 and 2. So I made this generator yield 10 elements instead and I just add the first number at the front.

Just in time came the just-in-time

It’s all fine and dandy, but if you run our previous code, it was ridiculously slow.

So I used a just-in-time compilation provided by numba package. When you run a function decorated by @jit, it is compiled into something faster than an interpreted Python code; and the following executions of the same function are blazingly fast!

So, let me show you the final code of my program and we’ll discuss it.

from itertools import product
from datetime import datetime
from multiprocessing import Pool
from timeit import default_timer as timer

import numpy as np
from numba import jit


# connections between planets

def police_gen():
	for i in range(3):
		for j in product(*((range(PLANETS_N),)*(MAXLEN-1))):
			yield (i,) + j

pgen = police_gen()

def _traverse(start, l=[]):
	ll = l + [start]
	if len(ll) < MAXLEN:
		for i in NEIGH[start]:
			yield from _traverse(i,ll)
		yield ll

def get_criminal_routes():
	for start_i in range(len(NEIGH)):
		yield from _traverse(start_i)

criminal_routes = np.array(tuple(get_criminal_routes()))

def test_po(police_routes):
	for po_i in range(len(police_routes)):
		for cr_i in range(len(criminal_routes)):
			# checking every element in both routes
			for i in range(MAXLEN):
				if police_routes[po_i][i]==criminal_routes[cr_i][i]:
					# caught, checking the next possible route of the criminal
				# not caught, this policeman's strategy is bad
			# every strategy works! This is the result
			print("result:", police_routes[po_i])

def main():
	counter = 0
	done = False
	while not done:
		src = []
		for _ in range(1000000):
			except StopIteration:
				done = True

		src = np.array(src, dtype=np.int32)
		print('%D %H:%M:%S'), "Starting calculation for iteration:", counter)	
		print('%D %H:%M:%S'), "Finished calculation for iteration:", counter)

		counter += 1

if __name__ == '__main__':
	start_time = timer()
	print("Elapsed time:", timer() - start_time, "seconds.")

The functions that generate the routes are the same. Look at the main() function now. First, we generate a list of one million combinations, as before. Then we convert it into a numpy array. The reason is, they actually work faster in a compiled code. I will probably make an article with benchmarks someday. This approach is a bit memory-inefficient, but will suffice for now. Additionally, I could make the type numpy.int8 to preserve memory, since the numbers in our array cannot exceed 6 anyway.

And finally, I pass this array to the analyzing function test_po. Its previous version did not have that outer for loop. Also, I have implemented a more C-like approach: I don’t iterate over the data structures directly but rather get the elements by indices. It seems that in this particular case it works faster, even though some of my other experiments have shown that the two approaches are equal in speed on jit. I will conduct more experiments and write a blog post if I find something interesting.

Ultimtely, the test_po function is now decorated with @jit. So, I’ve run this code and was totally smashed. A one-million long iteration that took over a minute to complete now takes just slightly over a second! The performance on this stage has increased sixty- or seventy-fold! So the whole program took a bit over 24 minutes in just one process, not eight! Now that’s a boost!

In conclusion

There was, of course, an overhead on generation of police routes; I could make the program even faster. But for now it will suffice; the aim of this article was to introduce you to jit.

You may ask, if we could get such an enourmous boost with only one core, can we utilize all eight now? Well, yes, we can. But it will require a significant reworking of our code, so I will leave it for the next article, where we will introduce some better coding practices as well.

So stay tuned! And as always, thanks for reading, and I will see you eventually!