A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS

A scalable 64-lane chip-to-chip I/O, with per-lane data rate of 2-16 Gb/s is demonstrated in 32-nm low-power CMOS technology. At maximum aggregate bandwidth of 1.024 Tb/s across 50-cm channel length, the link consumes 2.7 W from a 1.08-V supply, corresponding to 2.6 pJ/bit. As bandwidth demand decre...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of solid-state circuits Vol. 48; no. 12; pp. 3229 - 3242
Main Authors Mansuri, Mozhgan, Jaussi, James E., Kennedy, Joseph T., Tzu-Chien Hsueh, Shekhar, Sudip, Balamurugan, Ganesh, O'Mahony, Frank, Roberts, Clark, Mooney, Randy, Casper, Bryan
Format Journal Article Conference Proceeding
LanguageEnglish
Published New York, NY IEEE 01.12.2013
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A scalable 64-lane chip-to-chip I/O, with per-lane data rate of 2-16 Gb/s is demonstrated in 32-nm low-power CMOS technology. At maximum aggregate bandwidth of 1.024 Tb/s across 50-cm channel length, the link consumes 2.7 W from a 1.08-V supply, corresponding to 2.6 pJ/bit. As bandwidth demand decreases, scaling the per-lane data rate to 4 Gb/s and power supply to 0.65 V provides 1/4 of the maximum bandwidth while consuming 0.2 W. Across a 1-m channel, the link operates at a maximum per-lane data rate of 16 Gb/s; thus, providing up to 1.024 Tb/s of aggregate bandwidth with 3.2 pJ/bit power efficiency from a 1.15-V supply. A length-matched dense interconnect topology allows clocking to be shared across multiple lanes to reduce area and power. Reconfigurable current/voltage mode transmitter driver and CMOS clocking enable a highly scalable power-efficient link. Optional low-dropout regulators provide >22-dB supply noise rejection at the package resonance frequency of 200 MHz. System-level optimization of duty-cycle and quadrature error correctors across the clock hierarchy provides optimized clock phase placement and, thus, enhances link performance and power. A lane failover mechanism provides design robustness to mitigate channel or circuit defects. The active circuitry occupies 1.3 mm 2 .
ISSN:0018-9200
1558-173X
DOI:10.1109/JSSC.2013.2279052