# Future Technological Challenges for High Performance Computers

Dec.6, 2005



**Tadashi Watanabe** 

# **History of High Performance Computers**





# The Faster the Speed, the More the Parallel

### **The Largest configuration in SX-3**



The Earth Simulator



22GFlops/4Cpu

<u>1990</u>



40TFlops/5120Cpu

**2002** 



# **History of SX Series**



# **Evolution of SX Series for 20 years**





Will this technological evolution continue?

Are there any problems or difficulties to overcome?

If so, what are they?



# Capacity Computing and Capability Computing











#### **PC Cluster / Blade Server**

### **Capacity Computing**

- Goals: Workload and Throughput Many Jobs per time
- Many Small Problems
- Parallel or Cluster Machine based on Microprocessor

#### **Vector / SX**

### **Capability** Computing

- Goals: High Speed Execution of Single Job
- Large and Critical Problem Grand Challenge
- Powerful Processor and Highbandwidth
   Network



# **Divergence Problem**





# Multi Core Technology for Capacity Computing



More Parallel

Speed incease by more parallelism is not a solution to divergence problem

\_\_\_\_\_

\_\_\_\_\_



# **Highly Efficient Capability Computing**





# Future Technologies for Capability Computing



### The Future of PetaFLOPS Computing

### Applications and Required Performance





# Technology Road Map





# Structure of LSI: UX6(CB90H)

### 9 layers Cu/Lowk(2.9) wiring



Tr(60nm gate length)





写真提供: NECエレクトロニクス(株)

# **Structure of Tr**



Power= 
$$\frac{f \cdot \text{Ci} \cdot \text{v}^2}{\text{Op \cdot Power(wiring)}} + \frac{f \cdot \text{Ct} \cdot \text{v}^2}{\text{Op \cdot Power(Tr)}} + \frac{(\text{loff+lg}) \cdot \text{v}}{\text{Std-by Power}}$$



## **Power dissipation trend**

Problems of conventional scaling appear in the power dissipation





# **Power Dissipation and Cooling**





# **Technologies for Power Scaling**

- Operating Power
  - 1) Device Process
    - Low-k Material
    - Strained Si
    - -SOI
  - 2) Circuit · Architecture
    - Skew Design
    - Self Clocking
    - Clock Gating

-Multi Clock (ex.) 8GHz, 16cores 0.035um, 0.7V 300W

- **■** Stand-by Power
  - 1) Device Process
    - -Multi Vt
    - High-k Material
  - 2) Circuit · Architecture
    - Multiple Power
    - Bias Control





# **New Technologies**



# **Low Temperature CMOS**

- Advantages of Low Temperature
  - Tr
    - ⇒ Reduced loff exponentially
    - ⇒ Higher Carrier Mobility (Faster Speed)
  - Wiring
    - ⇒ Reduction of Resistance
- Required Technologies
  - Highly Efficient Cooling
  - Packaging for Low Temperature CMOS
     (Board and Chip)



Current: 65°C



Future~ 0°C ??





# **FinFET**

- ☐ FinFET is one of 3D device structure.
- ☐ Good channel controllability, because of DG structure.
- ☐ New technologies are required to fabricate FinFETs.









### Carbon-Nanotube Field-Effect Transistors

Possible application: low-cost, low-power LSI, rf drivers

Position-controllable on-wafer growth (catalyst CVD)

•Extremely high transconductance:

 $g_m$ =8.7 µS/tube (5800 µS/µm)

Si nFET:  $1000\sim1200~\mu\text{S/}\mu\text{m}$ 

pFET: 400~600 μS/μm









 $V_G=0.6$ 



# Si Nano Photonics (Optical Interconnections)





# Signal Transmission (Chip to Chip)





# Optical Interconnection (Chip to Chip)

- High Density Optical Interconnection by Multi-Layer Wave Guide
- Optical Cross Interconnection





# **Cooling Technology**





# Future Technologies after Si

1.Single Flux Quantum Device (SFQ)

2.Quantam Computer



# 1.Single Flux Quantum Device (SFQ)



# Nb-based Superconducting SFQ (Single Flux Quantum) circuits

Unique device which can realize higher clock-speed LSIs than semiconductor devices

Ultra high speed and low power nature

cf. CMOS LSI: Big barrier against clock-speed higher than 10 GHz because of power and wiring problems

Compound semiconductors: LSI impossible because of large power consumption

Signal propagation at the speed of light in superconducting transmission lines





# Comparison with Semiconductor and SFQ high-speed gates



High-speed and large-scale integrated circuits can be realized by the SFQ technology



# **Examples for operated SFQ LSIs**

#### Router



4x4 switch including 2,812 JJs operated up to 40 GHz.



4x4 switch scheduler including 3,071 JJs operated up to 40 GHz.

by SRL

### Microprocessor



Microprocessor including 7,220 JJs fully operated up to 21 GHz.

by Nagoya Univ. & Yokohama Nat'l Univ.



## Toward larger-scale SFQ integrated circuits



Cross section view of a Nb nine-layer structure



1 million SQUID (2 million JJs) on a 8 mm square chip.

Developed new fabrication process





# Estimation of a supercomputer network area reduction by using SFQ switches





SFQ is the key technology for saving area



Reduce network area to 1/10 by the SFQ technology

Network area for a 400 TFLOPS supercomputer normalized by the Earth simulator network area

# 2.Quantam Computer



# **Ultra-High Speed Computing**



Farming/Food/Chemistry
Oil/Energy/Environment

# **Principle of qubit state**

### **Usual bit information**





One pair of 2<sup>N</sup> combination

### Quantum bit information





#### **N** bits

$$a_{1}|0000...0\rangle + a_{2}|1100...0\rangle + a_{3}|1110...0\rangle + ..._{2}^{N+}$$
 $a_{2N}|1111...1\rangle$ 

Combination of "0" and "1"

combination of 2<sup>N</sup> states

# Ultimate performance of quantum computing



### **Road Map of Quantum Computation**





# **Solid State Qubits by NEC**

Apr.1999 Coherent control of macroscopic quantum states

in a single-Cooper-pair box

(Control of Superposition)

(Nature Vol.398, '99)

Magazine Cover →



Feb. 2003 Quantum oscillations in two coupled charge qubits

(Entanglement Creation)

(Nature Vol.421, '03)

Oct. 2003 Demonstration of conditional gate operation using superconducting charge qubits (CNOT operation)

# What Can Quantum Computer Do?

- 1. Problems that the modern computer can not solve, like factoring large numbers.
- 2. Simulation of complex quantum systems, like proteins, nano-materials, and others.
- 3. Applications to quantum communication.



# Conclusion

- •We can see technologies for up to 45nm process.
- After 45nm, there are several technologies, but yet unseen.
- After Si, new technologies are emerging, but still experimental.

It will take more than 10 years for actual use.



# Finally,

Engineers usually take it pessimistic, because they always face technical barriers and difficulties, and must make risky challenges.

Looking back the past, however, most of technical barriers have been overcome, and what we dreamed in the past has been realized.

I have never imagined tera-flops nor peta-flops computing when I started the development of the first giga-flops computer. The tera-flops was a dream,

but the tera-flops computers are already in use, and the peta-flops will surely come in 2010.

A dream will be in reality only when continuous efforts to realize it have been made.

