Using Multiple Processors

<< Click to Display Table of Contents >>

Navigation:  User Guide >

Using Multiple Processors

Previous pageReturn to chapter overviewNext page

FlexPDE version 6 and later uses multi-threaded computation to support modern multi-core and multi-processor hardware configurations.  Only shared-memory multi-processors are supported, not clusters.

Each opened problem runs in its own computation thread, and can use up to twenty-four additional computation threads.  A single main thread controls the graphic interface and screen display.

Matrix construction, residual calculations, linear system solvers and plot mesh generation are all multi-threaded.  Computation mesh generation and plot display functions are not, although graphics load is shared between the problem thread and the main graphics thread.

Individual Problem Control

Each individual script can declare the number of worker threads to be used in the computation:

SELECT THREADS = <number>

requests that <number> worker threads be used, in addition to the main graphics thread and the individual problem thread.

Setting the Default

The default number of worker threads can be changed in the General Settings tab of the Preferences Window.

Command-Line Control

If you run FlexPDE from a command line and include the switch -T<number>, the default thread count will be set to <number>.  For example, the command line

flexpde8 -T4 problem

will set the default to 4 threads and load the script file "problem.pde".  The selected thread count will be written to the .ini file on conclusion of the flexpde session and become the default next time FlexPDE is run.

Speed Effects of Multiple Processors

There are many factors that will influence the timing of a multi-thread run.

The dominant factor is the memory bandwidth. If the memory cannot keep up with the processor speed, then more threads will run slower due to the overhead of constructing and synchronizing threads and merging data.

The size of the problem will also affect the speedup, because with a larger problem a smaller proportion of data can be held in cache memory. The memory bandwidth limitation will therefore be greater with a larger problem.

The following chart shows our experience with speeds in versions 5 and 6. These tests were run on a 4-core AMD Phenom with 667 MHz 128-bit memory. Notice that the Black_Oil problem is significantly faster in version 6, even though it is taking many more timesteps. This timestep count indicates that the timestep control in V6 is more pessimistic than V5.  The speedup with V6 1 thread is partly due to the fact that graphic redraws are run in a separate thread in V6 but not in V5.

Notice that in this machine, the memory saturates at 3 threads, so that the fourth thread produces no significant speed improvement (and in fact may be slower).

 


 

Black_Oil.pde

3D_FlowBox.pde

Version

Threads

CPU time

timesteps

CPU time

5

1

14:37

534

8:15

5

2

12:17

540

6:09

 

 

 

 

 

6

1

10:21

688

8:06

6

2

6:58

684

4:14

6

3

6:16

696

3:30

6

4

7:13

703

3:22