Algorithms and Architectures for Parallel Processing 12th International Conference, ICA3PP 2012, Fukuoka, Japan, September 4-7, 2012, Proceedings, Part I
The two volume set LNCS 7439 and 7440 comprises the proceedings of the 12th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2012, as well as some workshop papers of the CDCN 2012 workshop which was held in conjunction with this conference. The 40 regular pape...
Saved in:
Main Authors | , , , , , |
---|---|
Format | eBook |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin / Heidelberg
2012
Springer |
Edition | 1 |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Table of Contents:
- Budget Constrained Resource Allocation for Non-deterministic Workflows on an IaaS Cloud -- Introduction -- Related Work -- Problem Statement -- Platform and Application Models -- Metrics and Problem Statement -- Allocating a Non-deterministic Workflow -- Splitting the Workflow -- Distributing Budget to Sub-workflows -- Determining PTG Allocations -- Experimental Evaluation -- Platform Description -- Comparison of Allocation Times -- Simulation Results -- Conclusion and Future Work -- References -- On aWideband Fast Fourier Transform Using Piecewise Linear Approximations: Application to a Radio Telescope Spectrometer -- Introduction -- Spectrometer on a Radio Telescope -- Requirement of Radio Telescope -- Related Works -- Contribution of This Paper -- Radix 2k Fast Fourier Transform (R2kFFT) -- N-Point Discrete Fourier Transform -- N-Point Fast Fourier Transform (FFT) -- Radix 2k Butterfly (R2kButterfly) -- Twiddle Factor Circuit Using Piecewise Linear Approximation -- Piecewise Linear Approximation Circuit -- Reduction of Approximation Error for Trigonometric Functions -- Analysis of the Approximation Error -- Experimental Results -- Implementation Environment -- Comparison with Other FFT Libraries -- Comparison with the SETI Spectrometer -- Comparison with GPUs -- Conclusion -- References -- A Verified Library of Algorithmic Skeletons on Evenly Distributed Arrays -- Introduction -- Related Work -- Specification of Algorithmic Skeletons -- An Overview of OSL Programming Model -- Specification in Coq -- Applications Using the Specification -- One Dimensional Heat Equation -- Maximum Segment Sum -- Bulk Synchronous Parallel ML -- Verified Implementation of Algorithmic Skeletons -- Conclusion and Perspectives -- References -- Performance Measurement of Parallel Vlasov Code for Space Plasma on Various Scalar-Type Supercomputer Systems
- Space Plasma Simulations -- Overview of Numerical Schemes -- Performance Evaluation -- Performance on Multi-core Systems -- Scaling on Massively Parallel Supercomputers -- Conclusion -- References -- Ultrasound Simulation on the Cell Broadband Engine Using the Westervelt Equation -- Introduction -- Westervelt Equation -- The Cell and the PS3 -- Approach -- SIMD -- Double Buffering -- Manual Loop Unrolling -- Multiple Timesteps in between Memory Access -- Ignore Empty Regions -- Results -- Conclusion and Future Research -- References -- Study on the Data Flow Balance in NFS Server with iSCSI -- Introduction -- Related Work -- Overview of iSCSI and NFS Processing -- SCSI Initiator Processing -- NFS Server Processing -- Comparison and Summary -- A Standard iSCSI-Based NFS Server's Data Flow -- Theoretical Model of the Data Flow within the Server -- Setup and Performance Evaluation -- System Setup -- Simulation Evaluation -- Real Evaluation -- Conclusion -- References -- Performance, Scalability, and Semantics of Concurrent FIFO Queues -- Introduction -- k-FIFO Queues -- Implementations -- Scal Queues -- Semantical Deviation -- Computing HL -- Related Work -- Experiments -- Microbenchmarks -- Macrobenchmarks -- Conclusions -- References -- Scalable Distributed Architecture for Media Transcoding -- Introduction -- Related Work -- Architecture and Implementation -- Centralized Architecture -- Distributed Architecture -- Task-Oriented Parallel Processing -- Worker Execution Workflow -- Modelling the Distributed Transcoding Problem -- Evaluation -- Sample Media -- Transcoding Speed -- Transcoding Quality -- Task Scheduling -- Conclusions and Future Work -- References -- GPU-Accelerated Restricted Boltzmann Machine for Collaborative Filtering -- Introduction -- Background -- Restricted Boltzmann Machine -- Restricted Boltzmann Machine for CF
- Introduction -- The LD Pattern of Selective Sweeps -- Fine- and Coarse-Grain Parallelizations -- Multi-grain Parallelization -- Underlying Idea -- Implementation -- Performance Evaluation -- Conclusion and Future Work -- References -- Vectorized Algorithms for Quadtree Construction and Descent -- Introduction -- Background and Related Work -- Quadtrees -- Vectorized Quadtrees -- Related Work -- The Hash Approach -- Algorithms -- Tree Construction -- Vectorized Tree Descent -- Complexity Analysis -- Conclusion -- References -- A Multi-level Monitoring Framework for Stream-Based Coordination Programs -- Introduction -- Definitions -- Stream-Based Coordination Programs -- Performance Metrics -- Timing Metrics -- Conceptions of the Monitoring Framework -- Monitoring the Runtime System -- Monitoring the Operating System -- Benefits of the Monitoring Framework -- Performance Metric Measurement -- Automatic Load Balancing -- Bottleneck Detection -- Instantiation of the Monitoring Framework for S-Net -- Stream-Processing with S-Net -- LPEL - A User-Mode Microkernel -- Operation Modes -- Evaluation of the Monitoring Framework -- Related Work -- Conclusion -- References -- An Optimal Parallel Prefix-Sums Algorithm on the Memory Machine Models for GPUs -- Introduction -- Parallel Memory Machines: DMM and UMM -- Contiguous Memory Access -- An Optimal Parallel Algorithm for Computing the Sum -- The Lower Bound of the Computing Time and the Latency Hiding -- A Naive Prefix-Sums Algorithm -- Our Optimal Prefix-Sums Algorithm -- Conclusion -- References -- A Multi-GPU Programming Library for Real-Time Applications -- Introduction -- MGPU -- Runtime Environment -- Memory Management -- Data Transfer -- Libraries -- Kernel Invocation and Synchronization -- Evaluation -- MRI Image Reconstruction -- Reconstruction Problem and Algorithm
- Matrix-Based Training Algorithm for RBM-CF
- Title -- Organization -- Table of Contents -- ICA3PP 2012 Regular Papers -- Accelerating the Dynamic Programming for the Optimal Polygon Triangulation on the GPU -- Introduction -- The Optimal Polygon Triangulation and the Dynamic Programming Approach -- GPU and CUDA Architectures -- Our Implementation of the Dynamic Programming Approach for the Optimal Polygon Triangulation -- Granularity Adjustment Technique -- Sliding and Mirroring Arrangement -- Our Algorithm for the Optimal Polygon Triangulation -- Experimental Results -- Concluding Remarks -- References -- Security Computing for the Resiliency of Protecting from Internal Attacks in Distributed Wireless Sensor Networks -- Introduction -- Background about Internal Attacks in WSNs -- Proposed Security Computing for Protecting WSNs -- Simulation and Discussion -- Conclusion -- References -- Parallel Algorithm for Nonlinear Network Optimization Problems and Real-Time Applications -- Introduction -- Statement of Nonlinear Network Optimization Problems -- Proposed Parallel Algorithm -- Combining Successive Quadratic Programming with Dual Method -- Complete Decomposition Effect of Proposed Parallel Algorithm -- Computational Efficiency of Proposed Parallel Algorithm -- Convergence of Proposed Parallel Algorithm -- Algorithmic Steps at Each Bus of Proposed Parallel Algorithm -- Power Flow Problem of Smart Grid -- Preliminaries -- Numerical Simulations -- Conclusions -- References -- Optimization of a Short-Range Proximity Effect Correction Algorithm in E-Beam Lithography Using GPGPUs -- Introduction -- Proximity Effect Correction in Detail -- Related Work -- Short-Range Proximity Effect Correction Algorithm -- Neighbor Search -- Influence Calculation -- Equation Solver -- Results -- Conclusion and Future Work -- References -- Exploiting Multi-grain Parallelism for Efficient Selective Sweep Detection
- Single- and Multi-GPU Implementation -- Results -- Related Work -- Conclusion -- References -- Optimal Linear Programming Solutions for Multiprocessor Scheduling with Communication Delays -- Introduction -- Task Scheduling Model -- Related Work -- Proposed Formulations -- ILP-RevisedBooleanLogic -- ILP-TransitivityClause -- Packing Formulation -- Computational Results -- Experimental Setup -- Result Table -- Conclusion -- References -- A Bitstream Relocation Technique to Improve Flexibility of Partial Reconfiguration -- Introduction -- Related Research -- Designing Uniformed Reconfigurable Regions -- Issues of PRB Relocation -- Uniforming Included Reconfigurable Resource -- Uniforming Placment of Proxy Logic -- Uniforming Interconnect between Proxy Logic and Static Module -- Excluding Crossing Wire -- Modifying Configuration Bitstream -- PRR Conjunction -- Verification of Bitstream Relocation -- Conclusion and Future Work -- References -- A Hybrid Heuristic-Genetic Algorithm for Task Scheduling in Heterogeneous Multi-core System -- Introduction -- Problem Description -- Related Work -- Heterogeneous Earliest Finish Time Algorithm -- Dynamic Level Scheduling Algorithm -- Hybrid Heuristic-Genetic Scheduling Algorithm -- The HSCGS Algorithm -- Successor Concerned List Scheduling Algorithm -- Improved Genetic Algorithm -- Experiment Results and Discussion -- Performance Metrics -- Generating Random DAGs -- Experimental Parameters -- Performance Results and Discussion -- Conclude -- References -- Efficient Task Assignment on Heterogeneous Multicore Systems Considering Communication Overhead -- Introduction -- Models -- Heterogeneous Cluster Model -- Task Model -- Problem Definition -- Motivational Example -- ILP Formulation for the HTAC Problem -- The Ratio Greedy Assign Algorithm for the HTAC Problem -- Experiments -- Conclusion -- References