Introduction Emir DAMERGI INSAT 2017/18 INTRODUCTION: Many terms related to embedded systems: - GPP, DSP, ASIP, SPP, SOC, ASIC, FPGA, SISD, SIMD, CISC, RISC, etc.. Many terms related to embedded systems: - GPP, DSP, ASIP, ASIC, FPGA, SISD, SIMD, etc.. DAMERGI Emir – INSAT 2018 2 INTRODUCTION Solution: Problem (Scientific or Industrial) - Algorithms - Mathematical computing Processing Software approach (standard HW: Instruction Set Processor) Processor GPP DSP ASIP Dedicated Hardware Design of a tailord HW SPP « Single Purpose Processor » 3 DAMERGI Emir – INSAT 2018 Dedicated Hardware (SPP) Algorithm Finite State Machines Truth/Transition Tables SPP: DataPath + control Program (VHDL, Verilog) SPP: Single Purpose Processor DAMERGI Emir – INSAT 2018 4 Dedicated Hardware (SPP) Memory Data Inputs Predifiend control sequence Control Unit Data Outputs DataPath DAMERGI Emir – INSAT 2018 5 Software approach Target Platform (Processor) chosen according to the application constraints: (GPP, DSP, ASIP) Algorithm GPP: General Purpose Processor Program (C, C++, ASM, …) Processing memory Unit (Control + DataPath) Binary File DAMERGI Emir – INSAT 2018 6 Software approach: Processing Unit Memory Instructi ons Control sequence depends on Instructions Control Unit Data Inputs Data Outputs DataPath (ALU) Code and Data are placed generally in different memories: • RAM: Data variables. • Flash: Code and Data constants DAMERGI Emir – INSAT 2018 7 Software approach: Processing Unit • When Instructions and Data are fetched through the same Bus (Data Bus) Von Neuman Architecture Memory Control Unit DataPath (ALU) Code (Instructions) Address Bus Data C O D E D A T A Data Bus (Data + Code) DAMERGI Emir – INSAT 2018 8 Software approach: Processing Unit • When Instructions and Data are fetched through 2 differents Buses: HARVARD Architecture Memory Memory Control Unit Code (Instructions) DataPath (ALU) Address Bus Address Bus C O D E Code Data D A T A Data DAMERGI Emir – INSAT 2018 9 Selection Criteria Criteria 1: (FC = Final Cost per Product) Q : Quantity NRE FC UC Q NRE: Non Recurrent Engineering Cost UC: Unit Cost Criteria 2 : Time To Market On-Time Peak Revenues (R) Delayed Peak R (P-D)/P Product Release 0 D P 2P Temps Delay DAMERGI Emir – INSAT 2018 10 Selection Criteria Criteria 3 : Performance • Number of executed Instructions (resp Floating Operations) per time unit for 1Mhz Clock Frequency : MIPS/MHz (resp. MFLOPS/MHz) • Clock Frequency Absolute Performance = Clock Frequency * MIPS/Mhz • Memory Access Bandwidth (Octets/sec) ARM Cortex-M0 ARM Cortex-M3/M4 ARM Cortex-A5 ARM Cortex-A7 0.9 MIPS/MHz 1.25 MIPS/MHz 1.57 MIPS/MHz 1.9 MIPS/MHz DAMERGI Emir – INSAT 2018 11 Selection Criteria Criteria 4 : Power (Energy Consumption) • Power is expressed in Watt [W] • Energy is expressed in Joule [J] = Power * Time Consequences on: • Battery Lifetime (Ah) Autonomy • Heat dissipation System size and weight Examples: PCs: Tens of WATTs SmartPhones: Watts MCU: microWatts MilliWatts DAMERGI Emir – INSAT 2018 12 Selection Criteria Criteria 5 : Flexibility How hard it is to make the system evolve or to change completely his behavior? DAMERGI Emir – INSAT 2018 13 Example The expression of the form Is present in a large number of mathematical models Convolution (Filter): Correlation: y(k) = c(m)= Matrix computation DAMERGI Emir – INSAT 2018 14 Example : Execution on GPP ∑ 𝑎 ∗𝑥 = (𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 ) Memory Ri a 0 Control Unit Rj x3 Processing Unit Ri Rj Rk (+,*,-,/, ..) Acc 𝟎 𝟑 𝟏 𝟏 𝟐 Acc = Rj* Ri Rk Rres Ri a 1 Rj x2 UAL 𝟎 Acc = Rj* Ri Ri Rres Acc = Rj + Rk Rk Rres …… ……. …… …… 𝐪𝟎 + 𝐪𝟏 𝟐 𝟐 𝟏 𝐪𝟎 + 𝐪𝟏 + 𝐪𝟐 𝟑 𝟑 𝟎 𝐪𝟎 + 𝐪𝟏 + 𝐪𝟐 + DAMERGI Emir – INSAT 2018 15 Example : Execution on GPP instructions Memory Ri a0 4 Rj x3 𝟎 𝟎 22 instructions: 𝟑 Rres = Rj* Ri - 22 instr. fetching from Memory Rk Rres - 8 Mem to reg transfers Ri a1 4 Rj x2 Rres = Rj* Ri 𝟏 𝟏 𝟐 Ri Rres 2 4 2 4 2 Rres = Rj + Rk Rk Rres …… ……. …… …… - 7 Reg to Reg transfers - 4 Multiplications 𝐪𝟎 + 𝐪𝟏 𝟐 - 4 additions 𝟐 𝟏 𝐪𝟎 + 𝐪𝟏 + 𝐪𝟐 𝟑 𝟑 𝟎 𝐪𝟎 + 𝐪𝟏 + 𝐪𝟐 + 𝐪𝟑 DAMERGI Emir – INSAT 2018 16 Example : Execution on Dedicated Hardware )+ )+ )+( Memory (only data, No instruction) a0 , a1, a2, a3 - 8 Mem to reg transfers (Parallel) x0, x1, x2, x3 - 4 (2 Terms) Multiplication Control Unit (executed in parallel) - 1 (4 Terms) Addition * * * * + DAMERGI Emir – INSAT 2018 17 Example: Performance GPP Dedicated Hardware - 22 Instr. Fetch Cycles - 0 - 8 Mem to reg transfer Cycles - 1 Mem to reg transfer Cycle - 7 Reg to Reg transfer Cycles - 0 - 4 Multiplication Cycles - 1 Multiplication Cycle - 4 addition Cycles - 1 addition Cycle --- +++ DAMERGI Emir – INSAT 2018 18 Example: Energy GPP Dedicated Hardware - 22 Instr. Fetch - 0 - 8 Mem to reg transfers - 8 Mem to reg transfers - 7 Reg to Reg Transfers - 0 - 4 Multiplications - 4 Multiplications - 4 additions - 4 additions --- +++ Furthermore, the GPP controller is more complex and consumes more energy DAMERGI Emir – INSAT 2018 19 Example: Flexibility/Time To market change the processing to: GPP Dedicated Hardware - Very easy: Rewrite the software and compile +++ Impossible: A new HW design must be realized --- DAMERGI Emir – INSAT 2018 20 Example: Criteria comparison Flexibility / Time To Market Best ( +++ ) Software Approach GPP Dedicated HW (SPP) SPP Performance / Power Best ( +++ ) DAMERGI Emir – INSAT 2018 21 Example: Tradeoff GPP/SPP A Trade off between Flexibility and Performance ? Instructions Specialized (Processor) Hardware A Processor with and Specialized Datapath (ALU) DAMERGI Emir – INSAT 2018 22 Example: Tradeoff GPP/SPP Memory Instructi ons Control Unit Data Inputs Data Outputs Specialized DataPath (ALU) A Instruction processor with support: - Hardwired Application Specific instructions - Parallelization DAMERGI Emir – INSAT 2018 23 Example : tradeoff GPP/SPP Multiple ALUs Processing Unit Ri Rj Rk Control Unit ALU ALU Rl Parallel procesing With 2 ALUs: ACC Ri * Rj + Rk * Rl SIMD: Single Instruction Multiple Data ADD ACC MAC : Multiply ACcumulate ACC ACC + (Ri * Rj + Rk * Rl) DAMERGI Emir – INSAT 2018 24 Example : tradeoff GPP/SPP (𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 ) Memory Ri a0 Processing Unit Ri Rj Rk Rj x3 Rl Rk a1 Control Unit Rl x2 ACC+ a0 x3+ a1x2 ALU ALU Ri a2 Rj x1 ADD Rk a3 Rl x0 ACC ACCACC+ a0 x3+ a1x2 …… ……. …… DAMERGI Emir – INSAT 2018 25 Example : Tradeoff GPP/SPP (𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 )+(𝑎 ∗ 𝑥 ) Memory Ri a0 Rj x3 Rk a1 Rl x2 ACC a0 x3+ a1x2 Ri a2 Rj x1 Rk a3 Rl x0 10 instructions: - 10 instr. fetching from Memory - 8 Mem to reg transfers - 4 Multiplications (2 Multiply cycles) - 2 additions (2 Addition cycles) - 1 MAC operation ACCACC+ a0 x3+ a1x2 …… ……. …… DAMERGI Emir – INSAT 2018 26 Example : Tradeoff GPP/SPP Control Unit Processing Unit The number of ALUs can be Registers (Data) higher (4, 8, 16 , …) All ALUs are controlled by ALU ALU …… ALU the same ctrl signal All ALUs execute the same Instruction SIMD Architecture: Single Instruction Multiple Data SIMD Architecture + MAC are found in DSP (Digital Signal Processors) DAMERGI Emir – INSAT 2018 27 Example: Criteria comparison Flexibility / Time To Market Best ( +++ ) Software Approach GPP DSP Dedicated HW (SPP) SPP Performance / Power Best ( +++ ) DAMERGI Emir – INSAT 2018 28 Flynn Taxonomy: SISD Architecture: Single Instruction Single Data SIMD Architecture: Single Instruction Multiple Data (Cortex-M3) (Cortex-M4) PU: Processing Unit (ALU) DAMERGI Emir – INSAT 2018 29 Flynn Taxonomy: MISD Architecture: Single Instruction Single Data MIMD Architecture: Single Instruction Multiple Data PU: Processing Unit DAMERGI Emir – INSAT 2018 30 What else Memory Instruct ions Control Unit Data Inputs • Deep Specialization of Data Output s Application Specific DataPath (ALU) the core (ALU) • Instruction set tailored for a Specific Application ASIP Application Specific Instruction Processor DAMERGI Emir – INSAT 2018 31 Example: Criteria comparison Flexibility / Time To Market Best ( +++ ) Software Approach GPP DSP ASIP Dedicated HW (SPP) SPP Performance / Power Best ( +++ ) DAMERGI Emir – INSAT 2018 32 SPP: Implementation SPP on silicon: Transisitors on FPGA: Logic Blocks Fixed Architecture (HW) Programmable Architecture (HW) ASIC: Application Specific SPP on FPGA Integrated Circuit DAMERGI Emir – INSAT 2018 33 Example: Criteria comparison Flexibility / Time To Market Best ( +++ ) Software Approach GPP DSP Dedicated HW (SPP) ASIP SPP on FPGA ASIC Performance / Power Best ( +++ ) DAMERGI Emir – INSAT 2018 34 Choice dilemma: …? Smart Phone • Flexibility: Update The Satellite Receiver GUI, applications… • Performance: Decryption, Sound & image decoding, 3G/4G Communications... GPP, DSP, ASIP, ASIC, FPGA ? DAMERGI Emir – INSAT 2018 35 Choice dilemma : solution Satellite Receiver Smart Phone SPP GPP (HW ) SPP (HW ) ASIP FPGA SOC: System On Chip Choice of SOC components guided by the Application Single Purpose DAMERGI Emir – INSAT 2018 36 SOCs: Set-top Box STi5518 OMEGA ASIP, SPP, FPGA Embedded processor (GPP) DAMERGI Emir – INSAT 2018 37 SOCs: 4G SOC « Snapdragon » ASIP DSP SPP (ASIC, FPGA) Embedded processor (GPP) DAMERGI Emir – INSAT 2018 38 General Purpose SOCs: Microcontrollers (MCU) Applications: • IoT • Domotics • Home appliance • Automobile Memory (RAM) Processor Program Memory (Flash, Eeprom) BUS Timer(s) Convertisseur(s) A/N & N/A Entrées /Sorties Parallèles Communication série Advanced Peripherals Basic Peripherals Tens of MCUnit references per Manufacturer: CPU: 8 bits 32 bits , x MHz 100’s MHz Memory: Kbytes Mbytes Peripherals: Basic (GPIO, Timers, USARTS) Advanced (Ethernet, Crypto, ..) DAMERGI Emir – INSAT 2018 39