I am having a hard time with a data-structure related problem. I've tried quite a lot recently, but I don't know how to proceed. The problem is that I have the right output, but the timing is too slow and I don't pass the automated tests.
To solve the problem, I am using a min-heap to implement the priority queue with next free times for the workers -- how could I make it more efficient? Efficiency is critical here.
Task description
You have a program which is parallelized and uses m independent threads to process the given list of n jobs. Threads take jobs in the order they are given in the input. If there is a free thread, it immediately takes the next job from the list. If a thread has started processing a job, it doesn’t interrupt or stop until it finishes processing the job. If several threads try to take jobs from the list simultaneously, the thread with smaller index takes the job. For each job you know exactly how long will it take any thread to process this job, and this time is the same for all the threads. You need to determine for each job which thread will process it and when will it start processing.
Input Format. The first line of the input contains integers m (amount of workers) and n (amount of jobs). The second line contains n integers — the times in seconds it takes any thread to process a specific job. The times are given in the same order as they are in the list from which threads take jobs.
Output Format. Output exactly n lines. i-th line (0-based index is used) should contain two space- separated integers — the 0-based index of the thread which will process the i-th job and the time in seconds when it will start processing that job.*
from collections import deque
import numpy as np
class solveJobs:
class Node(dict):
def __getattr__(self, attr):
return self.get(attr, None)
Node.__eq__ = lambda self, other: self.nextfreetime == other.nextfreetime and self.worker == other.worker
Node.__ne__ = lambda self, other: self.nextfreetime != other.nextfreetime and self.worker != other.worker
Node.__lt__ = lambda self, other: self.nextfreetime < other.nextfreetime or (self.nextfreetime == other.nextfreetime and np.int(self.worker) < np.int(other.worker))
Node.__le__ = lambda self, other: self.nextfreetime <= other.nextfreetime
Node.__gt__ = lambda self, other: self.nextfreetime > other.nextfreetime or (self.nextfreetime == other.nextfreetime and np.int(self.worker) > np.int(other.worker))
Node.__ge__ = lambda self, other: self.nextfreetime >= other.nextfreetime
class nextfreetimeQueue:
def __init__(self, nodes):
self.size = 0
self.heap = deque([None])
self.labeled = False
def __str__(self):
return str(list(self.heap)[1:])
def swap(self, i, j):
'''
Swap the values of nodes at index i and j.
'''
self.heap[i], self.heap[j] = self.heap[j], self.heap[i]
# if self.labeled:
# I, J = self.heap[i], self.heap[j]
# self.position[I.label] = i
# self.position[J.label] = j
def shift_up(self, i):
'''
move upward the value at index i to restore heap property.
'''
p = i // 2 # index of parent node
while p:
if self.heap[i] < self.heap[p]:
self.swap(i, p) # swap with parent
i = p # new index after swapping with parent
p = p // 2 # new parent index
def shift_down(self, i):
'''
move downward the value at index i to restore heap property.
'''
c = i * 2
while c <= self.size:
c = self.min_child(i)
if self.heap[i] > self.heap[c] or self.heap[i] == self.heap[c]:
self.swap(i, c)
i = c # new index after swapping with child
c = c * 2 # new child index
def min_child(self, i):
'''
Return index of minimum child node.
'''
l, r = (i * 2), (i * 2 + 1) # indices of left and right child nodes
if r > self.size:
return l
else:
return l if self.heap[l] < self.heap[r] else r
@property
def min(self):
'''
Return minimum node in heap.
'''
return self.heap[1]
def insert(self, node):
'''
Append `node` to the heap and move up
if necessary to maintain heap property.
'''
# if has_label(node) and self.labeled:
# self.position[node.label] = self.size
self.heap.append(node)
self.size += 1
self.shift_up(self.size)
def read_data(self):
self.num_workers, jobcount = map(np.int, input().split()) # first number is the amount of WORKERS, second is the number of jobs
self.job_durations = list(map(np.int, input().split())) # TAKE INTEGER OVER ALL SPLITS OF INPUT
self.wq = nextfreetimeQueue([])
for i in range(self.num_workers):
self.wq.insert(Node(worker=i+1,nextfreetime=0))
# assert jobcount == len(self.job_durations)
self.assigned_workers = [None] * len(self.job_durations) # which thread takes
self.start_times = [None] * len(self.job_durations) # WHEN A JOB IS STARTED
def write_response(self):
for i in range(len(self.job_durations)): # for each job, do:
print(self.assigned_workers[i]-1, self.start_times[i]) # print the worker and when it starts the JOB I
def assign_jobs(self):
for i in range(len(self.job_durations)): # loop over all jobs
next_worker_node = self.wq.min # finds the minimum free time dict (worker, nextfreetime)
# nft = next_worker_node['nextfreetime']
self.assigned_workers[i] = next_worker_node['worker'] # assign the worker index to the list
self.start_times[i] = next_worker_node['nextfreetime'] # assign that worker's next free time to job starting time
self.wq.min['nextfreetime'] += self.job_durations[i] # increase workers next free time
self.wq.shift_down(1)
def solve(self):
self.read_data()
self.assign_jobs()
self.write_response()