A multi-threaded version of Field II

A multi-threaded version of Field II has been developed, which automatically can use the multi-core capabilities of modern CPUs. The memory allocation routines were rewritten to minimize the number of dynamic allocations and to make pre-allocations possible for each thread. This ensures that the sim...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Ultrasonics Symposium pp. 2229 - 2232
Main Author Jensen, Jorgen Arendt
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A multi-threaded version of Field II has been developed, which automatically can use the multi-core capabilities of modern CPUs. The memory allocation routines were rewritten to minimize the number of dynamic allocations and to make pre-allocations possible for each thread. This ensures that the simulation job can be automatically partitioned and the interdependence between threads minimized. The new code has been compared to Field II version 3.22, October 27, 2013 (latest free-ware version). A 64 element 5 MHz focused array transducer was simulated. One million point scatterers randomly distributed in a plane of 20 × 50 mm (width × depth) with random Gaussian amplitudes were simulated using the command calc scat. Dual Intel Xeon CPU E5-2630 2.60 GHz CPUs were used under Ubuntu Linux 10.02 and Matlab version 2013b. Each CPU holds 6 cores with hyper-threading, corresponding to a total of 24 hyper-threading cores. The averaged simulation time for 10 realizations for the old version was 85.1 s. A single thread run for the new version took 27.7 s; a speed-up of 3.1. Employing all 24 cores gave a simulation time of 3.27 s for the one million scatterers corresponding to a speed-up factor of 26 times. The speed-up in general depends on the transducer, scatterers and simulation, and it varies across applications between 13 and 30. The program is fully compatible with older versions, and only a single command has been added for setting the number of threads to use. The division of labor is automatically handled by the program. For a phantom with 100,000 scatterers, it is now possible to simulate a full 128 line image in around 42 seconds with full precision.
ISSN:1051-0117
DOI:10.1109/ULTSYM.2014.0555