www.prismmodelchecker.org

Task-Graph Scheduling

(Bouyer, Fahrenberg, Larsen & Markey)

Contents

Related publications: [NPS13]

Introduction

One common use of (non-probabilistic) timed automata is the formalisation and solution of scheduler optimisation problems. In this case study, we consider an extension of the task-graph scheduling problem described in [BFLM11]. We show how PTAs can be used to introduce uncertainty with regards to time delays and to consider the possibility of failures. We demonstrate how the digital clocks method can determine optimal schedulers for these problems, in terms of the (expected) time and energy consumption required to complete all tasks.

The Basic Model

The basic model concerns computing the arithmetic expression D × (C × (A + B)) + ((A + B) + (C × D)) using two processors (P1 and P2) that have different speed and energy requirements. The specification of the processors is as follows.

  P1 P2
+2 picoseconds5 picoseconds
×3 picoseconds7 picoseconds
idle10 Watts20 Watts
active90 Watts30 Watts

The following figure presents a task graph for computing this expression and shows both the tasks that need to be performed (the subterms of the expression) and the dependencies between the tasks (the order the tasks must be evaluated in).

task graph representation

The system is formed as the parallel composition of three PTAs (PRISM modules), one for the scheduler and one for each processor is presented below.

// basic task graph model from
// Bouyer, Fahrenberg, Larsen and Markey
// Quantitative analysis of real-time systems using priced timed automata
// Communications of the ACM, 54(9):78–87, 2011

pta // model is a PTA

module scheduler

	task1 : [0..3]; // A+B
	task2 : [0..3]; // CxD
	task3 : [0..3]; // Cx(A+B)
	task4 : [0..3]; // (A+B)+(CxD)
	task5 : [0..3]; // DxCx(A+B)
	task6 : [0..3]; // (DxCx(A+B)) + ((A+B)+(CxD))

	// task status: 
	// 0 - not started
	// 1 - running on processor 1
	// 2 - running on processor 2
	// 3 - task complete
	
	// start task 1
	[p1_add] task1=0 -> (task1'=1);
	[p2_add] task1=0 -> (task1'=2);

	// start task 2
	[p1_mult] task2=0 -> (task2'=1);
	[p2_mult] task2=0 -> (task2'=2);
	
	// start task 3 (must wait for task 1 to complete)
	[p1_mult] task3=0 & task1=3 -> (task3'=1);
	[p2_mult] task3=0 & task1=3 -> (task3'=2);

	// start task 4 (must wait for tasks 1 and 2 to complete)
	[p1_add] task4=0 & task1=3 & task2=3 -> (task4'=1);
	[p2_add] task4=0 & task1=3 & task2=3 -> (task4'=2);
	
	// start task 5 (must wait for task 3 to complete)
	[p1_mult] task5=0 & task3=3 -> (task5'=1);
	[p2_mult] task5=0 & task3=3 -> (task5'=2);
	
	// start task 6 (must wait for tasks 4 and 5 to complete)
	[p1_add] task6=0 & task4=3 & task5=3 -> (task6'=1);
	[p2_add] task6=0 & task4=3 & task5=3 -> (task6'=2);
	
	// a task finishes on processor 1
	[p1_done] task1=1 -> (task1'=3);
	[p1_done] task2=1 -> (task2'=3);
	[p1_done] task3=1 -> (task3'=3);
	[p1_done] task4=1 -> (task4'=3);
	[p1_done] task5=1 -> (task5'=3);
	[p1_done] task6=1 -> (task6'=3);

	// a task finishes on processor 2
	[p2_done] task1=2 -> (task1'=3);
	[p2_done] task2=2 -> (task2'=3);
	[p2_done] task3=2 -> (task3'=3);
	[p2_done] task4=2 -> (task4'=3);
	[p2_done] task5=2 -> (task5'=3);
	[p2_done] task6=2 -> (task6'=3);
	
endmodule
	
// processor 1
module P1

	p1 : [0..2];
	// 0 - idle
	// 1 - add
	// 2 - multiply

	x1 : clock; // local clock
	
	invariant
        (p1=1 => x1<=2) &
        (p1=2 => x1<=3)
    endinvariant

	// addition
	[p1_add] p1=0 -> (p1'=1) & (x1'=0); // start
	[p1_done] p1=1 & x1=2 -> (p1'=0) & (x1'=0); // finish

	// multiplication
	[p1_mult] p1=0 -> (p1'=2) & (x1'=0); // start
	[p1_done] p1=2 & x1=3 -> (p1'=0) & (x1'=0);  // finish

endmodule

// processor 2
module P2

	p2 : [0..2];
	// 0 - idle
	// 1 - add
	// 2 - multiply

	x2 : clock; // local clock
	
	invariant
        (p2=1 => x2<=5) &
        (p2=2 => x2<=7)
    endinvariant

	// addition
	[p2_add] p2=0 -> (p2'=1) & (x2'=0); // start
	[p2_done] p2=1 & x2=5 -> (p2'=0) & (x2'=0); // finish

	// multiplication
	[p2_mult] p2=0 -> (p2'=2) & (x2'=0); // start
	[p2_done] p2=2 & x2=7 -> (p2'=0) & (x2'=0);  // finish

endmodule

// reward structure: elapsed time
rewards "time"
	true : 1;
endrewards

// reward structures: energy consumption
rewards "energy"
	p1=0 : 10/1000;
	p1>0 : 90/1000;
	p2=0 : 20/1000;
	p2>0 : 30/1000;
endrewards

// target state (all tasks complete)
label "tasks_complete" = (task6=3);
View: printable version          Download: task_graph.prism

The actions p1_add and p1_mult represent an addition and multiplication task being scheduled on P1 respectively, while the action p1_done indicates that the current task has been completed on P1. There are similar actions for P2. The systems includes clocks x1 and x2 which is used to keep track of the time that a task has been running on P1 and P2 respectively, and are therefore reset when a task starts and the invariants and guards correspond to the time required to complete the tasks of addition and multiplication on the processors.

We consider the following properties of the system.

// scheduler can complete all tasks (max probability all tasks complete equals 1)
Pmax=?[F "tasks_complete" ]; 

// min expected time to complete all tasks
R{"time"}min=?[F "tasks_complete" ];

// min expected energy to complete all tasks
R{"energy"}min=?[F "tasks_complete" ];
View: printable version          Download: task_graph.props

Using the digital clocks PTA engine we find the following optimal scheduler for minimising the elapsed time to complete all tasks, which takes 12 picoseconds to complete all tasks.

time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task4 task6
P2 task2

In the case of energy consumption, we find an optimal scheduler which requires 1.3200 nanojoules and schedules tasks as shown below.

time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task4
P2 task2 task5 task6

Random Task Execution Times

We now extend the basic model making the time required for each processor to perform a task probabilistic. More precisely, if, in the original problem the time for a processor to perform a task was k, we suppose now that the time taken is uniformly distributed between the delays k-1, k and k+1.

The PRISM model for this scenario is presented below. Note that to prevent the scheduler from seeing into the future when making decisions, the probabilistic choice for task completion is made on completion rather than on initialisation.

// task graph model with random execution times
// extends the task graph problem from
// Bouyer, Fahrenberg, Larsen and Markey
// Quantitative analysis of real-time systems using priced timed automata
// Communications of the ACM, 54(9):78–87, 2011

pta // model is a PTA

module scheduler

	task1 : [0..3]; // A+B
	task2 : [0..3]; // CxD
	task3 : [0..3]; // Cx(A+B)
	task4 : [0..3]; // (A+B)+(CxD)
	task5 : [0..3]; // DxCx(A+B)
	task6 : [0..3]; // (DxCx(A+B)) + ((A+B)+(CxD))

	// task status: 
	// 0 - not started
	// 1 - running on processor 1
	// 2 - running on processor 2
	// 3 - task complete
	
	// start task 1
	[p1_add] task1=0 -> (task1'=1);
	[p2_add] task1=0 -> (task1'=2);

	// start task 2
	[p1_mult] task2=0 -> (task2'=1);
	[p2_mult] task2=0 -> (task2'=2);
	
	// start task 3 (must wait for task 1 to complete)
	[p1_mult] task3=0 & task1=3 -> (task3'=1);
	[p2_mult] task3=0 & task1=3 -> (task3'=2);

	// start task 4 (must wait for tasks 1 and 2 to complete)
	[p1_add] task4=0 & task1=3 & task2=3 -> (task4'=1);
	[p2_add] task4=0 & task1=3 & task2=3 -> (task4'=2);
	
	// start task 5 (must wait for task 3 to complete)
	[p1_mult] task5=0 & task3=3 -> (task5'=1);
	[p2_mult] task5=0 & task3=3 -> (task5'=2);
	
	// start task 6 (must wait for tasks 4 and 5 to complete)
	[p1_add] task6=0 & task4=3 & task5=3 -> (task6'=1);
	[p2_add] task6=0 & task4=3 & task5=3 -> (task6'=2);
	
	// a task finishes on processor 1
	[p1_done] task1=1 -> (task1'=3);
	[p1_done] task2=1 -> (task2'=3);
	[p1_done] task3=1 -> (task3'=3);
	[p1_done] task4=1 -> (task4'=3);
	[p1_done] task5=1 -> (task5'=3);
	[p1_done] task6=1 -> (task6'=3);

	// a task finishes on processor 2
	[p2_done] task1=2 -> (task1'=3);
	[p2_done] task2=2 -> (task2'=3);
	[p2_done] task3=2 -> (task3'=3);
	[p2_done] task4=2 -> (task4'=3);
	[p2_done] task5=2 -> (task5'=3);
	[p2_done] task6=2 -> (task6'=3);
	
endmodule
	
// processor 1
module P1

	p1 : [0..3]; // 0 inactive, 1 - adding, 2 - multiplying, 3 - done
	c1 : [0..2]; // used for the uniform probabilistic choice of execution time
	x1 : clock; // local clock
	
	invariant
        (p1=1 => x1<=1) &
        ((p1=2 & c1=0) => x1<=2) &
        ((p1=2 & c1>0)=> x1<=1) &
        (p1=3 => x1<=0)
    endinvariant

	// addition
	[p1_add] p1=0 -> (p1'=1) & (x1'=0); // start
	[] p1=1 & x1=1 & c1=0 -> 1/3 : (p1'=3) & (x1'=0) & (c1'=0) 
	                             + 2/3 : (c1'=1) & (x1'=0); // k-1
	[] p1=1 & x1=1 & c1=1 -> 1/2 : (p1'=3) & (x1'=0) & (c1'=0)
	                             + 1/2 : (c1'=2) & (x1'=0); // k
	[p1_done] p1=1 & x1=1 & c1=2 -> (p1'=0) & (x1'=0) & (c1'=0); // k+1

	// multiplication
	[p1_mult] p1=0 -> (p1'=2) & (x1'=0); // start
	[] p1=2 & x1=2 & c1=0 -> 1/3 : (p1'=3) & (x1'=0) & (c1'=0) 
	                             + 2/3 : (c1'=1) & (x1'=0); // k-1
	[] p1=2 & x1=1 & c1=1 -> 1/2 : (p1'=3) & (x1'=0) & (c1'=0)
	                             + 1/2 : (c1'=2) & (x1'=0); // k
	[p1_done] p1=2 & x1=1 & c1=2 -> (p1'=0) & (x1'=0) & (c1'=0); // k+1
	
	[p1_done] p1=3 -> (p1'=0);  // finish
	
endmodule

// processor 2
module P2

	p2 : [0..3]; // 0 inactive, 1 - adding, 2 - multiplying, 3 - done
	c2 : [0..2]; // used for the uniform probabilistic choice of execution time
	x2 : clock; // local clock
	
	invariant
        ((p2=1 & c2=0) => x2<=4) &
        ((p2=1 & c2>0)=> x2<=1) &
        ((p2=2 & c2=0) => x2<=6) &
        ((p2=2 & c2>0)=> x2<=1) &
        (p2=3 => x2<=0)
    endinvariant

	// addition
	[p2_add] p2=0 -> (p2'=1) & (x2'=0); // start
	[] p2=1 & x2=4 & c2=0 -> 1/3 : (p2'=3) & (x2'=0) & (c2'=0) 
	                             + 2/3 : (c2'=1) & (x2'=0); // k-1
	[] p2=1 & x2=1 & c2=1 -> 1/2 : (p2'=3) & (x2'=0) & (c2'=0) 
	                             + 1/2 : (c2'=2) & (x2'=0); // k
	[p2_done] p2=1 & x2=1 & c2=2 -> (p2'=0) & (x2'=0) & (c2'=0); // k+1

	// multiplication
	[p2_mult] p2=0 -> (p2'=2) & (x2'=0); // start
	[] p2=2 & x2=6 & c2=0 -> 1/3 : (p2'=3) & (x2'=0) & (c2'=0) 
	                             + 2/3 : (c2'=1) & (x2'=0); // k-1
	[] p2=2 & x2=1 & c2=1 -> 1/2 : (p2'=3) & (x2'=0) & (c2'=0) 
	                             + 1/2 : (c2'=2) & (x2'=0); // k
	[p2_done] p2=2 & x2=1 & c2=2 -> (p2'=0) & (x2'=0) & (c2'=0); // k+1
	[p2_done] p2=3 -> (p2'=0);  // finish
	
endmodule

// reward structure: elapsed time
rewards "time"
	true : 1;
endrewards

// reward structures: energy consumption
rewards "energy"
	p1=0 : 10/1000;
	p1>0 : 90/1000;
	p2=0 : 20/1000;
	p2>0 : 30/1000;
endrewards

// target state (all tasks complete)
label "tasks_complete" = (task6=3);
View: printable version          Download: task_graph_execution.prism

Analysing this model, we find that the optimal expected time and energy consumption to complete all tasks equal 12.226 picoseconds and 1.3201 nanojoules, respectively. This improves on the results obtained using the optimal schedulers for the original model, where the expected time and energy consumption equal 13.1852 picoseconds and 1.3211 nanojoules, respectively.

Synthesising optimal schedulers, we find that they change their decision based upon the delays of previously completed tasks. For example, for elapsed time, the optimal scheduler starts as for the non-probabilistic case, first scheduling task1 followed by task3 on P1 and task2 on P2. However, it is now possible for task2 to complete before task3 (if the execution times for task1, task2 and task3 are 3, 6 and 4 picoseconds respectively), in which case the optimal scheduler now makes a different decision from the non-probabilistic case. Under one possible set of execution times for the remaining tasks, the optimal scheduling is as follows:

time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task6
P2 task2 task4

Adding a Faulty Processor

As a second extension of the scheduling problem, we add a third processor P3 which consumes the same energy as P2 but is faster (addition takes 3 picoseconds and multiplication 5 picoseconds). However, this comes at a cost: there is a chance (probability p) that the processor fails and the computation must be rescheduled and performed again.

The PRISM model for this scenario is presented below. Note that to prevent the scheduler from seeing into the future when making decisions, the probabilistic choice for task completion is made on completion rather than on initialisation.

// basic task graph model from
// Bouyer, Fahrenberg, Larsen and Markey
// Quantitative analysis of real-time systems using priced timed automata
// Communications of the ACM, 54(9):78–87, 2011

pta // model is a PTA

// probability P3 fails
const double p;

module scheduler

	task1 : [0..4]; // A+B
	task2 : [0..4]; // CxD
	task3 : [0..4]; // Cx(A+B)
	task4 : [0..4]; // (A+B)+(CxD)
	task5 : [0..4]; // DxCx(A+B)
	task6 : [0..4]; // (DxCx(A+B)) + ((A+B)+(CxD))

	// task status: 
	// 0 - not started
	// 1 - running on processor 1
	// 2 - running on processor 2
	// 3 - running on processor 3
	// 4 - task complete
	
	// start task 1
	[p1_add] task1=0 -> (task1'=1);
	[p2_add] task1=0 -> (task1'=2);
	[p3_add] task1=0 -> (task1'=3);

	// start task 2
	[p1_mult] task2=0 -> (task2'=1);
	[p2_mult] task2=0 -> (task2'=2);
	[p3_mult] task2=0 -> (task2'=3);
	
	// start task 3 (must wait for task 1 to complete)
	[p1_mult] task3=0 & task1=4 -> (task3'=1);
	[p2_mult] task3=0 & task1=4 -> (task3'=2);
	[p3_mult] task3=0 & task1=4 -> (task3'=3);

	// start task 4 (must wait for tasks 1 and 2 to complete)
	[p1_add] task4=0 & task1=4 & task2=4 -> (task4'=1);
	[p2_add] task4=0 & task1=4 & task2=4 -> (task4'=2);
	[p3_add] task4=0 & task1=4 & task2=4 -> (task4'=3);
	
	// start task 5 (must wait for task 3 to complete)
	[p1_mult] task5=0 & task3=4 -> (task5'=1);
	[p2_mult] task5=0 & task3=4 -> (task5'=2);
	[p3_mult] task5=0 & task3=4 -> (task5'=3);
	
	// start task 6 (must wait for tasks 4 and 5 to complete)
	[p1_add] task6=0 & task4=4 & task5=4 -> (task6'=1);
	[p2_add] task6=0 & task4=4 & task5=4 -> (task6'=2);
	[p3_add] task6=0 & task4=4 & task5=4 -> (task6'=3);
	
	// a task finishes on processor 1
	[p1_done] task1=1 -> (task1'=4);
	[p1_done] task2=1 -> (task2'=4);
	[p1_done] task3=1 -> (task3'=4);
	[p1_done] task4=1 -> (task4'=4);
	[p1_done] task5=1 -> (task5'=4);
	[p1_done] task6=1 -> (task6'=4);

	// a task finishes on processor 2
	[p2_done] task1=2 -> (task1'=4);
	[p2_done] task2=2 -> (task2'=4);
	[p2_done] task3=2 -> (task3'=4);
	[p2_done] task4=2 -> (task4'=4);
	[p2_done] task5=2 -> (task5'=4);
	[p2_done] task6=2 -> (task6'=4);

	// a task finishes on processor 2
	[p3_done] task1=3 -> (task1'=4);
	[p3_done] task2=3 -> (task2'=4);
	[p3_done] task3=3 -> (task3'=4);
	[p3_done] task4=3 -> (task4'=4);
	[p3_done] task5=3 -> (task5'=4);
	[p3_done] task6=3 -> (task6'=4);
	[p3_fail] task1=3 -> (task1'=0);
	[p3_fail] task2=3 -> (task2'=0);
	[p3_fail] task3=3 -> (task3'=0);
	[p3_fail] task4=3 -> (task4'=0);
	[p3_fail] task5=3 -> (task5'=0);
	[p3_fail] task6=3 -> (task6'=0);
	
endmodule
	
// processor 1
module P1

	p1 : [0..2];
	// 0 - idle
	// 1 - add
	// 2 - multiply
	
	x1 : clock; // local clock
	
	invariant
        (p1=1 => x1<=2) &
        (p1=2 => x1<=3)
    endinvariant

	// addition
	[p1_add] p1=0 -> (p1'=1) & (x1'=0); // start
	[p1_done] p1=1 & x1=2 -> (p1'=0) & (x1'=0); // finish

	// multiplication
	[p1_mult] p1=0 -> (p1'=2) & (x1'=0); // start
	[p1_done] p1=2 & x1=3 -> (p1'=0) & (x1'=0);  // finish

endmodule

// processor 2
module P2

	p2 : [0..2];
	// 0 - idle
	// 1 - add
	// 2 - multiply
	
	x2 : clock; // local clock
	
	invariant
        (p2=1 => x2<=5) &
        (p2=2 => x2<=7)
    endinvariant

	// addition
	[p2_add] p2=0 -> (p2'=1) & (x2'=0); // start
	[p2_done] p2=1 & x2=5 -> (p2'=0) & (x2'=0); // finish

	// multiplication
	[p2_mult] p2=0 -> (p2'=2) & (x2'=0); // start
	[p2_done] p2=2 & x2=7 -> (p2'=0) & (x2'=0);  // finish

endmodule

// processor 3 (faster than processor 2 but can fail)
module P3

	p3 : [0..3];
	// 0 - idle
	// 1 - add
	// 2 - multiply
	fail3 : [0..1]; // failed to complete task
	
	x3 : clock; // local clock
	
	invariant
        (p3=1 => x3<=3) &
        (p3=2 => x3<=5) &
        (p3=3 => x3<=0)
    endinvariant

	// addition
	[p3_add] p3=0 -> (p3'=1) & (x3'=0); // start
	[] p3=1 & x3=3 -> p : (fail3'=1) & (p3'=3) & (x3'=0)  // fail
	                + (1-p) : (fail3'=0) & (p3'=3) & (x3'=0); // finish

	// multiplication
	[p3_mult] p3=0 -> (p3'=2) & (x3'=0); // start
	[] p3=2 & x3=5 -> p : (fail3'=1) & (p3'=3) & (x3'=0) // fail
	                + (1-p) : (fail3'=0) & (p3'=3) & (x3'=0); // finish

	[p3_fail] p3=3 & fail3=1 -> (p3'=0) & (x3'=0) & (fail3'=0); // failed
	[p3_done] p3=3 & fail3=0 -> (p3'=0) & (x3'=0) & (fail3'=0); // completed

endmodule

// reward structure: elapsed time
rewards "time"
	true : 1;
endrewards

// reward structures: energy consumption
rewards "energy"
	p1=0 : 10/1000;
	p1>0 : 90/1000;
	p2=0 : 20/1000;
	p2>0 : 30/1000;
endrewards

// target state (all tasks complete)
label "tasks_complete" = (task6=4);
View: printable version          Download: task_graph_failure.prism

Analysing this model, we find that for small values of p (the probability of P3 failing to successfully complete a task), as P3 has better performance than P2, both the optimal expected time and energy consumption can be improved using P3. However, as the probability of failure increases, P3’s better performance is out-weight by the chance of its failure and using it no longer yields optimal values. This can be seen in the following graphs which plot the optimal expected time and energy consumption for this extended model as the failure probability p varies.

plot: optimal expected time to complete

plot: optimal expected energy to complete

For example, we give an optimal scheduler for the expected time when p=0.25 which takes 11.0625 picoseconds (the optimal expected time is 12 picoseconds when P3 is not used). The dark boxes are used to denote the cases when P3 is scheduled to complete a task, but experiences a fault and does not complete the scheduled task correctly.

time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task6
P2
P3 task2 task4


time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task6
P2
P3 task2 task4


time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task4 task6
P2
P3 task2 task4


time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task6
P2
P3 task2 task4


time    1   2   3   4   5   6   7   8   9  10 11 12 13 14 15 16 17 18 19 20
P1 task1 task3 task5 task4 task6
P2
P3 task2 task4

Case Studies