Results April 21, 2006

Dense Matrix Multiplication Using OpenMP

The code used in this test solves the system A*x = b when matrix A is full with values. The parallel standard used was OpenMP. The code used was the Teuchos package from Sandia National Laboratories. The obtained results are shown in Table 3. The most interesting numbers are shown in red color. You can find the analysis on the bottom of this page.

 

MIOPS Number of Threads            
NEQ 1 2 4 8 16 32 64 128 256 512 1024 2048  
32 70.7039 129.444 201.283 244.909 187.967 27.8215 25.5386 25.2984 25.4565 28.265 27.1987 26.3786  
64 64.4648 126.876 242.889 444.411 553.656 175.209 175.339 170.509 165.263 178.959 192.797 163.113  
128 63.8348 127.19 252.011 477.465 741.955 545.474 702.764 822.704 749.806 595.638 682.043 964.883  
256 63.7332 127.153 247.775 493.825 867.528 1020.25 1013.72 1025.6 1152.31 779.991 849.307 1023.03  
512 62.8613 125.557 250.251 494.066 857.547 1222.63 1222.51 1203.83 1221.83 1152.75 1220.77 1221.63  
1024 60.2081 120.488 240.937 480.051 832.174 1242.68 1252.92 1251.34 1253.52 1233.78 1212.34 1250.67  
                           
                           
Speedup Number of Threads        
NEQ   2 4 8 16 32 64 128 256 512 1024 2048  
32   1.83 2.85 3.46 2.66 0.39 0.36 0.36 0.36 0.40 0.38 0.37  
64   1.97 3.77 6.89 8.59 2.72 2.72 2.64 2.56 2.78 2.99 2.53  
128   1.99 3.95 7.48 11.62 8.55 11.01 12.89 11.75 9.33 10.68 15.12  
256   2.00 3.89 7.75 13.61 16.01 15.91 16.09 18.08 12.24 13.33 16.05  
512   2.00 3.98 7.86 13.64 19.45 19.45 19.15 19.44 18.34 19.42 19.43  
1024   2.00 4.00 7.97 13.82 20.64 20.81 20.78 20.82 20.49 20.14 20.77  
                           
                           
Efficiency Number of Threads