Results April 6, 2006

Tpetra

Tpetra (Templated Linear Algebra Services Package) is a software developed at Sandia National Laboratories. We run this algorithm on T2000 using the shared memory standard OpenMP. The results obtained are shown in Table 1. You can find the analysis on the bottom of this page.

 

MIOPS              
  10 100 1000 10000 100000 1000000  
1 thread 12.244 34.5855 31.8034 32.2643 26.2351 25.1227  
2 threads 2.81033 17.9475 50.2481 62.7786 52.0425 48.2585  
4 threads 1.2174 11.4638 62.9097 115.496 101.905 92.4524  
8 threads 0.766131 7.80281 61.1201 187.29 194.294 169.859  
16 threads 0.629272 3.45052 32.5254 186.942 278.437 239.742  
32 threads 0.627869 0.240922 2.50485 24.0638 173.12 205.873  
               
               
SPEEDUP              
  10 100 1000 10000 100000 1000000  
2 threads 0.23 0.52 1.58 1.95 1.98 1.92  
4 threads 0.10 0.33 1.98 3.58 3.88 3.68  
8 threads 0.06 0.23 1.92 5.80 7.41 6.76  
16 threads 0.05 0.10 1.02 5.79 10.61 9.54  
32 threads 0.05 0.01 0.08 0.75 6.60 8.19  
               
               
               
EFFICIENCY              
  10 100 1000 10000 100000 1000000  
2 0.11 0.26 0.79 0.97 0.99 0.96  
4 0.02 0.08 0.49 0.89 0.97 0.92  
8 0.01 0.03 0.24 0.73 0.93 0.85  
16 0.00 0.01 0.06 0.36 0.66 0.60  
32 0.00 0.00 0.00 0.02 0.21 0.26  
               
               
Table 1: First results obtained using Tpetra

 

 

 

 

Analysis

We obtained interesting speedups results without much effort. This is a remarkable fact that shows that the machine is behaving like a parallel computer. We can see that Tpetra package runs up to 10.6 times faster in parallel mode than in serial mode when we use 16 threads and the matrices have 100000 rows and columns.

The performance of the machine increased up to 16 threads but then deceased. This is a result that we were not expecting. We supposed that the speedup will continue growing or at least will keep in the same level. This problem was solved later (see timeline).

Maximum SWaP was obtained at 16 threads and NEQ equal to 100000 (New !). If we consider a performance of 278.437 MIOPS, a space of 2 RU (rack units), and a power consumption of 300 Watts.