Good evening!

In the previous article we have managed to boost our program using JIT, reducing execution time approximately by a factor of 50-60, utilizing only one process. But hey, I have a Core i7! Can we utilize its whole potential?

The refactored program

I had to change our previous program quite significantly. Now it looks a bit less readable, more C-like. That was required for JIT, or else it caused significant slowdowns. Here’s the full program.

from datetime import datetime
from multiprocessing import Process, Manager
from timeit import default_timer as timer

import numpy as np
import numba

MAX_MOVES = 11
PLANETS_N = 7
PROCS = 16
MAX_POLICE_ITINERARY_INDEX = 3*(PLANETS_N**MAX_MOVES)//PLANETS_N
ROUTES_PER_ITERATION = 2*(10**7)

# connections between planets
NEIGH = (
	(1,),#0
	(0,2,),#1
	(1,3,5,),#2
	(2,4,),#3
	(3,),#4
	(2,6,),#5
	(5,),#6
)

def _traverse(start, l=[]):
	ll = l + [start]
	if len(ll) < MAX_MOVES:
		for i in NEIGH[start]:
			yield from _traverse(i,ll)
	else:
		yield ll

def get_criminal_routes():
	for start_i in range(len(NEIGH)):
		yield from _traverse(start_i)

criminal_routes = np.array(tuple(get_criminal_routes()))


@numba.jit
def test_po(minim, lim, out):
	out_cursor = 0

	police_routes = np.zeros((lim,MAX_MOVES), dtype=np.int8)
	for i in range(0, lim):
		police_routes[i] = dec_to_base(minim+i)

	for po_i in range(police_routes.shape[0]):
		for cr_i  in range(criminal_routes.shape[0]):
			# checking every element in both routes
			for i in range(MAX_MOVES):
				if police_routes[po_i][i]==criminal_routes[cr_i][i]:
					# caught, checking the next possible route of the criminal
					break
			else:
				# not caught, this policeman's strategy is bad
				break
		else:
			# all caught! This is the result!
			print(police_routes[po_i])
			for i in range(MAX_MOVES):
				out[out_cursor] = police_routes[po_i][i]
				out_cursor += 1


@numba.jit
def dec_to_base(n):
	m=n
	result = np.zeros(MAX_MOVES, dtype=np.int8)
	for i in range(MAX_MOVES,-1,-1):
		d, m = divmod(m,PLANETS_N**i)
		result[MAX_MOVES-i-1]=d
	return result


def process(proc_index, mq):
	counter = 0
	results = []
	lim = ROUTES_PER_ITERATION

	minim = proc_index*lim

	while minim < MAX_POLICE_ITINERARY_INDEX:

		out = np.zeros(lim*MAX_MOVES, dtype=np.int8)
		print(datetime.now().strftime('%D %H:%M:%S'), "Started analysis.", "Process:", proc_index, ". Iteration:", counter, ". Minimum:", minim)
		test_po(minim, min(lim, MAX_POLICE_ITINERARY_INDEX-minim), out)
		print(datetime.now().strftime('%D %H:%M:%S'), "Finished analysis.", "Process:", proc_index, ". Iteration:", counter, ". Minimum:", minim)
		
		out = out.reshape(lim,MAX_MOVES)
		for r in out:

			if r.any():
				results.append(tuple(r))
			else:
				break

		minim += lim*PROCS
		minim = min(minim, MAX_POLICE_ITINERARY_INDEX)

		counter += 1

	print("Result from {}:".format(proc_index), results)
	mq.extend(results)
	print("Process {} finished!".format(proc_index))


def main():
	procs = []
	manager = Manager()
	q = manager.list()
	for i in range(PROCS):
		p = Process(target=process, args=(i, q))
		procs.append(p)
		p.start()

	for p in procs:
		p.join()

	result = q

	print("Final results:", result)

	return result

if __name__ == '__main__':
	start_time = timer()
	result = tuple(main())
	print("Elapsed time:", timer() - start_time, "seconds.")

The smuggler’s routes generator remained the same, we’ll use its result throughout the program. It is rather small, only 6144 entries.

As for police routes, I implemented their generation right into the loop of a child process; it is no longer a separate generator.

Since a police route is a list of 11 numbers from 0 to 6, it can be treated as a base 7 11-digit number. We can make it an int instead to make things easier for the computer.

I ran this program in 8 processes, and it took just over 3 minutes. And a bit less in 16 processes; probably because some processes finish execution faster and let the others occupy the whole available virtual core afterwards.

In conclusion

I haven’t managed to make it run on GPU so far, but this program work extremely fast nonetheless. Optimizing for GPU may make it even less readable, so I’ll suspend this idea for now; maybe I’ll try to do it with a different task.

Thanks for tuning in! I’ll see you eventually.