

IT UNIVERSITY OF CPH



DASYA -- DASYA (itu.dk)

# CPU, GPU, FPGA, Accelerator

Ehsan Yousefzadeh-Asl-Miandoab (ehyo@itu.dk)

Data-Intensive Applications and Systems (DASYA)



- Computing
- Processors
- How do electrons work for us?!
- CPUs
- GPUs
- FPGAs
- Accelerators
- Tradeoff of processors

### Computing

- Definition
  - 1. Transforming the input data into the desired output data.
  - 2. Gaining knowledge (insight)
  - 3. Using computer technology to complete a goal-oriented task.
- Examples:
  - Weather forecast
  - Market analysis, price prediction
  - Computer-aided medical diagnosis





- Computing
- Processors
- How do electrons work for us?!
- Tradeoff of processors
- CPU
- GPU
- FPGA
- Accelerator

### Processors

- Devices that make the computing possible
- By making electrons run (work) for us
- Different types and vendors
  - CPUs
  - GPUs
  - FPGAs
  - Accelerators



- Computing
- Processors
- How do electrons work for us?
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors



### **Electrons I need your help!**

Can you apply a filter to my photo? Please 🙂

Electrons' reaction!









We need a translator who can speak Electrons' language ©

#### Examples:

- 1. Calculating the first 100 numbers of Fibonacci series
- 2. Training a machine learning model







Program to display the Fibonacci sequence up to n-th term

nterms = int(input("How many terms? "))

# first two terms
n1, n2 = 0, 1
count = 0

# check if the number of terms is valid if nterms <= 0: print("Please enter a positive integer") # if there is only one term, return n1 elif nterms == 1: print("Fibonacci sequence upto", nterms, ":") print(n1) # generate fibonacci sequence else: print("Fibonacci sequence:") while count < nterms: print(n1) nth = n1 + n2 # update values n1 = n2 n2 = nth

count += 1

**Computing Problem** Algorithm Program System Software Architecture Micro-Architecture Logic **Electronic Devices** Electrons

















| d1 | d2 | sum | Carry |
|----|----|-----|-------|
| 0  | 0  | 0   | 0     |
| 0  | 1  | 1   | 0     |
| 1  | 0  | 1   | 0     |
| 1  | 1  | 0   | 1     |







Metal under normal conditions



#### Metal in an electric field





- Computing
- Processors
- How do electrons work for us?!
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors





- Computing
- Processors
- How do electrons work for us?!
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors



### GPU

- Parallel Processor
- Throughput-oriented
- Low Working Frequency

|   | Core     | Con<br>trol | Core     | Con<br>trol |  |          |  |  |  |  |  |  |  |  |  |
|---|----------|-------------|----------|-------------|--|----------|--|--|--|--|--|--|--|--|--|
| 7 | L1 Cache |             | L1 Cache |             |  |          |  |  |  |  |  |  |  |  |  |
|   | Core     | Con         | Core     | Con         |  |          |  |  |  |  |  |  |  |  |  |
|   | L1 Cache | trol        | L1 Cache | trol        |  |          |  |  |  |  |  |  |  |  |  |
|   | L2 Cache |             | L2 Cach  | е           |  |          |  |  |  |  |  |  |  |  |  |
|   | L3 Cache |             |          |             |  | L2 Cache |  |  |  |  |  |  |  |  |  |
|   | DRAM     |             |          | DRAM        |  |          |  |  |  |  |  |  |  |  |  |
|   | CPU      |             |          | GPU         |  |          |  |  |  |  |  |  |  |  |  |



































































# Road Map

- Computing
- Processors
- How do electrons work for us?!
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors



## Field Programmable Gate Array (FPGA)

Initially, used for Prototyping by Electronic people

**High-Level Language** 



Very Low-Level Language (requires Digital Electronics basics knowledge)





- Supply the practitioners with the highest parallelism
- Built a specific Neural Network on Hardware
- Change it whenever you decide



- Supply the practitioners with the highest parallelism
- Built a specific Neural Network on Hardware
- Change it whenever you decide



- Supply the practitioners with the highest parallelism
- Built a specific Neural Network on Hardware
- Change it whenever you decide



- Supply the practitioners with the highest parallelism
- Built a specific Neural Network on Hardware
- Change it whenever you decide



# Road Map

- Computing
- Processors
- How do electrons work for us?!
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors



## Accelerator

- Application Specific Integrated Circuit (ASIC)
- Everything is specified by the designers



• No flexibility is expected but depends on the designers!

















LARGEST GPU 826mm<sup>2</sup> Silicon 80 Billion transistors

-0---

# Road Map

- Computing
- Processors
- How do electrons work for us?!
- CPU
- GPU
- FPGA
- Accelerator
- Tradeoff of processors



### Wrap-up

| Processor                          | Programmability                                         | Goal Program           | Flexibility<br>(Different<br>Programs) | Promised<br>Performance |
|------------------------------------|---------------------------------------------------------|------------------------|----------------------------------------|-------------------------|
| Central Processing Unit<br>(CPU)   | Easy (use any<br>programming<br>language you know)      | Latency Oriented       | Super High                             | Fair                    |
| Graphical Processing Unit<br>(GPU) | Medium (learning<br>CUDA!)                              | Throughput<br>Oriented | High                                   | Medium                  |
| Field Programmable Gate<br>Array   | Hard (Learn Verilog +<br>Digital electronics<br>basics) | Both                   | Low                                    | High                    |
| Accelerator                        | Very Hard (read<br>manuals)                             | Both                   | Super Low                              | Super High              |

## Some useful links for you

CUDA C++ Programming Guide (nvidia.com)

CUDA Toolkit - Free Tools and Training | NVIDIA Developer

**NVIDIA Blog** 

Deep Learning Institute and Training Solutions | NVIDIA

DGX Platform | NVIDIA

Intel Field Programmable Gate Arrays (FPGA) Technical Training | Intel

Product - Chip - Cerebras

Products | Coral

Altera<sup>®</sup> FPGAs and Programmable Devices (intel.com)

Center (amd.com Data Reimagining the

FPGAs & 3D ICs (xilinx.com)

- View Latest Generation Core ntel<sup>®</sup> Core<sup>™</sup> Processors Processors

AMD Processors | AMD

# Conclusion



# Questions?

Thanks for your attention!

## BackUp if sb was curious!

#### Modern CPUs



- They fetch and execute more than one instruction (a windows of instruction)
  - Higher throughput
- Advanced Hardware Execution Mechanisms to execute faster
- Employ Cache Hierarchy to fill the Memory-Processor performance gap
  - Temporal/ Spatial Locality
- They have several cores (parallel computing)





## Cache Hierarchy

### Less Access latency More Data Locality

Less Storage Capacity More Expensive per bit





Overall Time of Executing three Instruction: 3 \* (1ns + 1ns + 2ns + 1ns + 2ns) = 3 \* 7ns = 21ns

## Pipelining

• Pipelined Basic Processor





= 10ns + 2ns + 2ns = 14ns

