

# AndeStar<sup>™</sup> Instruction Set Architecture

# Manual

Document Number : MA0100-013 Issued : May 2009 Copyright© 2007-2009 Andes Technology Corporation. All rights reserved



# **Revision History**

| Revision | Revision   | Author    | Revised           |      | Description                     |
|----------|------------|-----------|-------------------|------|---------------------------------|
| Number   | Date       |           | Chapters/Sections |      |                                 |
| V1.3     | 5/5/2009   | CH/Wilson |                   | 1.   | Added LBUP/SBUP                 |
|          |            |           |                   |      | instructions.                   |
| V1.2     | 12/31/2008 | CH/Wilson |                   | 1.   | Added Baseline V2 instructions. |
| V1.1     | 6/13/2008  | CH/Wilson |                   | 1.   | Added N10 latency information.  |
|          |            |           |                   | 2.   | Added Reduced Register          |
|          |            |           |                   |      | configuration description.      |
| V1.0RB   | 2/25/2008  | CH/Wilson |                   | Ad   | ded DIV/DIVS instructions       |
| V 1.0    | 6/5/2007   | CH/Wilson |                   | Firs | st version                      |



# **Table of Contents**

| Revision Histo                           | Dry                                      | I   |
|------------------------------------------|------------------------------------------|-----|
| Table of Conte                           | ents                                     | II  |
| List of Tables.                          |                                          | IX  |
| List of Figures                          | 5                                        | XI  |
| Preface                                  |                                          | XII |
| Chapter 1                                | Introduction                             | 1   |
| 1.1                                      | 32/16-Bit ISA                            | 2   |
| 1.2                                      | Data Types                               | 2   |
| 1.3                                      | Registers                                | 2   |
| 1.3.1                                    | Reduced Register configuration option    | 5   |
| 1.4                                      | Instruction Classes                      | 6   |
| 1.5                                      | Instruction Encoding                     | 6   |
| 1.5.1                                    | 32-Bit Instruction Format                | 6   |
| 1.5.2                                    | 2 16-Bit Instruction Format              | 8   |
| 1.6                                      | Miscellaneous                            | 8   |
| Chapter 2                                | 32-Bit Baseline Instruction              | 9   |
| 2.1                                      | Data-processing Instructions             | 11  |
| 2.2                                      | Load and Store Instructions              | 13  |
| 2.3                                      | Jump and Branch Instructions             | 16  |
| 2.4                                      | Privilege Resource Access Instructions   | 18  |
| 2.5                                      | Miscellaneous Instructions               | 19  |
| Chapter 3                                | 16-Bit Baseline Instruction              | 22  |
| 3.1                                      | 32-bit Instruction Mapping               | 23  |
| Chapter 4                                | 16/32-Bit Baseline Version 2 Instruction | 27  |
| 4.1                                      | 16-bit Baseline V2 instructions          | 28  |
| 4.2                                      | 32-bit Baseline V2 instructions          | 29  |
| Chapter 5                                | 32-bit ISA Extensions                    | 31  |
| 5.1                                      | Performance Extension V1 Instructions    | 32  |
| 5.2                                      | Performance Extension V2 Instructions    | 33  |
| 5.3                                      | 32-bit String Extension                  | 34  |
| Chapter 6                                | Coprocessor Instructions                 | 35  |
| Chapter 7 Detail Instruction Description |                                          | 37  |
| 7.1                                      | 32-bit Baseline instructions             | 38  |



| ADD (Addition)                                            | 39  |
|-----------------------------------------------------------|-----|
| ADDI (Add Immediate)                                      | 40  |
| AND (Bit-wise Logical And)                                |     |
| ANDI (And Immediate)                                      |     |
| BEQ (Branch on Equal)                                     |     |
| BEQZ (Branch on Equal Zero)                               |     |
| BGEZ (Branch on Greater than or Equal to Zero)            | 45  |
| BGEZAL (Branch on Greater than or Equal to Zero and Link) |     |
| BGTZ (Branch on Greater than Zero)                        |     |
| BLEZ (Branch on Less than or Equal to Zero)               |     |
| BLTZ (Branch on Less than Zero)                           |     |
| BLTZAL (Branch on Less than Zero and Link)                | 50  |
| BNE (Branch on Not Equal)                                 |     |
| BNEZ (Branch on Not Equal Zero)                           |     |
| BREAK (Breakpoint)                                        | 53  |
| CCTL (Cache Control)                                      |     |
| CMOVN (Conditional Move on Not Zero)                      |     |
| CMOVZ (Conditional Move on Zero)                          |     |
| DIV (Unsigned Integer Divide)                             |     |
| DIVS (Signed Integer Divide)                              |     |
| DPREF/DPREFI (Data Prefetch)                              | 69  |
| DSB (Data Serialization Barrier)                          |     |
| IRET (Interruption Return)                                |     |
| ISB (Instruction Serialization Barrier)                   |     |
| ISYNC (Instruction Data Coherence Synchronization)        | 80  |
| J (Jump)                                                  |     |
| JAL (Jump and Link)                                       |     |
| JR (Jump Register)                                        |     |
| JR.xTOFF (Jump Register and Translation OFF)              |     |
| JRAL (Jump Register and Link)                             | 89  |
| JRAL.xTON (Jump Register and Link and Translation ON)     |     |
| LB (Load Byte)                                            |     |
| LBI (Load Byte Immediate)                                 |     |
| LBS (Load Byte Signed)                                    |     |
| LBSI (Load Byte Signed Immediate)                         |     |
| LH (Load Halfword)                                        | 100 |
|                                                           |     |



| LHI (Load Halfword Immediate)                        | 102 |
|------------------------------------------------------|-----|
| LHS (Load Halfword Signed)                           | 104 |
| LHSI (Load Halfword Signed Immediate)                | 106 |
| LLW (Load Locked Word)                               | 108 |
| LMW (Load Multiple Word)                             | 111 |
| LW (Load Word)                                       | 115 |
| LWI (Load Word Immediate)                            | 117 |
| LWUP (Load Word with User Privilege Translation)     | 119 |
| MADD32 (Multiply and Add to Data Low)                | 121 |
| MADD64 (Multiply and Add Unsigned)                   | 122 |
| MADDS64 (Multiply and Add Signed)                    | 123 |
| MFSR (Move From System Register)                     | 124 |
| MFUSR (Move From User Special Register)              | 125 |
| MOVI (Move Immediate)                                | 128 |
| MSUB32 (Multiply and Subtract to Data Low)           | 129 |
| MSUB64 (Multiply and Subtract Unsigned)              | 130 |
| MSUBS64 (Multiply and Subtract Signed)               | 131 |
| MSYNC (Memory Data Coherence Synchronization)        | 132 |
| MTSR (Move To System Register)                       | 135 |
| MTUSR (Move To User Special Register)                | 136 |
| MUL (Multiply Word to Register)                      | 139 |
| MULT32 (Multiply Word to Data Low)                   | 140 |
| MULT64 (Multiply Word Unsigned)                      | 141 |
| MULTS64 (Multiply Word Signed)                       | 142 |
| NOP (No Operation)                                   | 143 |
| NOR (Bit-wise Logical Nor)                           | 144 |
| OR (Bit-wise Logical Or)                             | 145 |
| ORI (Or Immediate)                                   | 146 |
| RET (Return from Register)                           | 147 |
| RET.xTOFF (Return from Register and Translation OFF) | 148 |
| ROTR (Rotate Right)                                  | 150 |
| ROTRI (Rotate Right Immediate)                       | 151 |
| SB (Store Byte)                                      | 152 |
| SBI (Store Byte Immediate)                           | 154 |
| SCW (Store Conditional Word)                         | 156 |
| SEB (Sign Extend Byte)                               | 160 |
|                                                      |     |



|     | SEH (Sign Extend Halfword)                        | 161 |
|-----|---------------------------------------------------|-----|
|     | SETEND (Set data endian)                          | 162 |
|     | SETGIE (Set global interrupt enable)              | 163 |
|     | SETHI (Set High Immediate)                        | 164 |
|     | SH (Store Halfword)                               | 165 |
|     | SHI (Store Halfword Immediate)                    | 167 |
|     | SLL (Shift Left Logical)                          | 169 |
|     | SLLI (Shift Left Logical Immediate)               | 170 |
|     | SLT (Set on Less Than)                            | 171 |
|     | SLTI (Set on Less Than Immediate)                 | 172 |
|     | SLTS (Set on Less Than Signed)                    | 173 |
|     | SLTSI (Set on Less Than Signed Immediate)         | 174 |
|     | SMW (Store Multiple Word)                         | 175 |
|     | SRA (Shift Right Arithmetic)                      | 179 |
|     | SRAI (Shift Right Arithmetic Immediate)           | 180 |
|     | SRL (Shift Right Logical)                         | 181 |
|     | SRLI (Shift Right Logical Immediate)              | 182 |
|     | STANDBY (Wait For External Event)                 | 183 |
|     | SUB (Subtraction)                                 | 185 |
|     | SUBRI (Subtract Reverse Immediate)                | 186 |
|     | SVA (Set on Overflow Add)                         | 187 |
|     | SVS (Set on Overflow Subtract)                    | 188 |
|     | SW (Store Word)                                   | 189 |
|     | SWI (Store Word Immediate)                        | 191 |
|     | SWUP (Store Word with User Privilege Translation) | 193 |
|     | SYSCALL (System Call)                             | 195 |
|     | TEQZ (Trap if equal 0)                            | 196 |
|     | TNEZ (Trap if not equal 0)                        | 197 |
|     | TLBOP (TLB Operation)                             | 198 |
|     | TRAP (Trap exception)                             | 205 |
|     | WSBH (Word Swap Byte within Halfword)             | 206 |
|     | XOR (Bit-wise Logical Exclusive Or)               | 207 |
|     | XORI (Exclusive Or Immediate)                     | 208 |
|     | ZEB (Zero Extend Byte)                            | 209 |
|     | ZEH (Zero Extend Halfword)                        | 210 |
| 7.2 | 32-bit Performance Extension instructions         | 211 |



|     | ABS (Absolute)                                          | . 212 |
|-----|---------------------------------------------------------|-------|
|     | AVE (Average with Rounding)                             | . 213 |
|     | BCLR (Bit Clear)                                        | . 214 |
|     | BSET (Bit Set)                                          | . 215 |
|     | BTGL (Bit Toggle)                                       | . 216 |
|     | BTST (Bit Test)                                         | . 217 |
|     | CLIP (Clip Value)                                       | . 218 |
|     | CLIPS (Clip Value Signed)                               | . 219 |
|     | CLO (Count Leading Ones)                                | . 220 |
|     | CLZ (Count Leading Zeros)                               | . 221 |
|     | MAX (Maximum)                                           | . 222 |
|     | MIN (Minimum)                                           | . 223 |
| 7.3 | 32-bit Performance Extension Version 2 instructions     | . 224 |
|     | BSE (Bit Stream Extraction)                             | . 225 |
|     | BSP (Bit Stream Packing)                                | . 231 |
|     | PBSAD (Parallel Byte Sum of Absolute Difference)        | . 237 |
|     | PBSADA (Parallel Byte Sum of Absolute Difference Accum) | . 238 |
| 7.4 | 32-bit STRING Extension instructions                    | . 239 |
| 7.5 | 16-bit Baseline instructions                            | . 240 |
|     | ADD (Add Register)                                      | . 241 |
|     | ADDI (Add Immediate)                                    | . 242 |
|     | BEQS38 (Branch on Equal Implied R5)                     | . 243 |
|     | BEQZ38 (Branch on Equal Zero)                           | . 244 |
|     | BEQZS8 (Branch on Equal Zero Implied R15)               | . 245 |
|     | BNES38 (Branch on Not Equal Implied R5)                 | . 246 |
|     | BNEZ38 (Branch on Not Equal Zero)                       | . 247 |
|     | BNEZS8 (Branch on Not Equal Zero Implied R15)           | . 248 |
|     | BREAK16 (Breakpoint)                                    | . 249 |
|     | J8 (Jump Immediate)                                     | . 250 |
|     | JR5 (Jump Register)                                     | . 251 |
|     | JRAL5 (Jump Register and Link)                          | . 252 |
|     | LBI333 (Load Byte Immediate Unsigned)                   | . 253 |
|     | LHI333 (Load Halfword Immediate Unsigned)               | . 254 |
|     | LWI333 (Load Word Immediate)                            | . 256 |
|     | LWI37 (Load Word Immediate with Implied FP)             | . 258 |
|     | LWI450 (Load Word Immediate)                            | . 260 |



|     | MOV55 (Move Register)                               | . 261 |
|-----|-----------------------------------------------------|-------|
|     | MOVI55 (Move Immediate)                             | . 262 |
|     | NOP16 (No Operation)                                | . 263 |
|     | RET5 (Return from Register)                         | . 264 |
|     | SBI333 (Store Byte Immediate)                       | . 265 |
|     | SEB33 (Sign Extend Byte)                            | . 266 |
|     | SEH33 (Sign Extend Halfword)                        | . 267 |
|     | SHI333 (Store Halfword Immediate)                   | . 268 |
|     | SLLI333 (Shift Left Logical Immediate)              | . 270 |
|     | SLT45 (Set on Less Than Unsigned)                   | . 271 |
|     | SLTI45 (Set on Less Than Unsigned Immediate)        | . 272 |
|     | SLTS45 (Set on Less Than Signed)                    | . 273 |
|     | SLTSI45 (Set on Less Than Signed Immediate)         | . 274 |
|     | SRAI45 (Shift Right Arithmetic Immediate)           | . 275 |
|     | SRLI45 (Shift Right Logical Immediate)              | . 276 |
|     | SUB (Subtract Register)                             | . 277 |
|     | SUBI (Subtract Immediate)                           | . 278 |
|     | SWI333 (Store Word Immediate)                       | . 279 |
|     | SWI37 (Store Word Immediate with Implied FP)        | . 281 |
|     | SWI450 (Store Word Immediate)                       | . 283 |
|     | X11B33 (Extract the Least 11 Bits)                  | . 284 |
|     | XLSB33 (Extract LSB)                                | . 285 |
|     | ZEB33 (Zero Extend Byte)                            | . 286 |
|     | ZEH33 (Zero Extend Halfword)                        | . 287 |
| 7.6 | 16-bit and 32-bit Baseline Version 2 instructions   | . 288 |
|     | ADDI10S (Add Immediate with Implied Stack Pointer)  | . 289 |
|     | LWI37SP (Load Word Immediate with Implied SP)       | . 290 |
|     | SWI37SP (Store Word Immediate with Implied SP)      | . 292 |
|     | ADDI.gp (GP-implied Add Immediate)                  | . 294 |
|     | DIVR (Unsigned Integer Divide to Registers)         | . 295 |
|     | DIVSR (Signed Integer Divide to Registers)          | . 297 |
|     | LBI.gp (GP-implied Load Byte Immediate)             | . 299 |
|     | LBSI.gp (GP-implied Load Byte Signed Immediate)     | . 300 |
|     | LBUP (Load Byte with User Privilege Translation)    | . 301 |
|     | LHI.gp (GP-implied Load Halfword Immediate)         | . 302 |
|     | LHSI.gp (GP-implied Load Signed Halfword Immediate) | . 303 |
|     | /                                                   |       |



|                        | LMWA (Load Multiple Word with Alignment Check)       |                                                            |     |  |  |
|------------------------|------------------------------------------------------|------------------------------------------------------------|-----|--|--|
|                        | LWI.gp (GP-implied Load Word Immediate)              |                                                            |     |  |  |
|                        | MADDR32 (Multiply and Add to 32-bit Register)        |                                                            |     |  |  |
|                        | MSUBR32 (Multiply and Subtract from 32-bit Register) |                                                            |     |  |  |
|                        | MULR64 (Multiply Word Unsigned to Registers)         |                                                            |     |  |  |
|                        | MU                                                   | LSR64 (Multiply Word Signed to Registers)                  | 313 |  |  |
|                        | SBI.                                                 | gp (GP-implied Store Byte Immediate)                       | 315 |  |  |
|                        | SBU                                                  | P (Store Byte with User Privilege Translation)             | 316 |  |  |
|                        | SHI.                                                 | gp (GP-implied Store Halfword Immediate)                   | 317 |  |  |
|                        | SMV                                                  | VA (Store Multiple Word with Alignment Check)              | 318 |  |  |
|                        | SWI                                                  | .gp (GP-implied Store Word Immediate)                      | 323 |  |  |
| Chapter                | 8                                                    | Instruction Latency for AndesCore Implementations          | 324 |  |  |
| 8.1 N12 Implementation |                                                      | N12 Implementation                                         | 325 |  |  |
|                        | 8.1.1                                                | Instruction Latency due to Resource Dependency             | 325 |  |  |
|                        | 8.1.2                                                | Cycle Penalty due to N12 Pipeline Control Mishaps Recovery | 329 |  |  |
| 8.2                    |                                                      | N10 Implementation                                         | 330 |  |  |
|                        | 8.2.1                                                | Dependency-related Instruction Latency                     | 330 |  |  |
|                        | 8.2.2                                                | Self-stall-related Instruction Latency                     | 333 |  |  |
|                        | 8.2.3                                                | Cycle Penalty due to N10 Pipeline Control Mishap Recover   | 334 |  |  |
|                        | 8.2.4                                                | Cycle Penalty due to Resource Contention                   | 336 |  |  |
| Chapter 9 An           |                                                      | AndesCore N12 implementation                               | 337 |  |  |
| 9.1 C                  |                                                      | CCTL Instruction                                           |     |  |  |
| 9.2                    |                                                      | STANDBY Instruction                                        |     |  |  |
| Chapter 10             |                                                      | AndesCore N1213 Hardcore Implementation Restriction        |     |  |  |
| 10.1                   |                                                      | Instruction Restriction                                    |     |  |  |
| 10.2                   |                                                      | ISYNC Instruction Note                                     |     |  |  |



# **List of Tables**

| Table 1  | Andes General Purpose Registers                        | 3  |
|----------|--------------------------------------------------------|----|
| Table 2  | Andes User Special Registers                           | 4  |
| Table 3  | Andes Status Registers                                 | 4  |
| Table 4  | Registers for Reduced Register Configuration           | 5  |
| Table 5  | ALU Instruction with Immediate (Baseline)              | 11 |
| Table 6  | ALU Instruction (Baseline)                             | 11 |
| Table 7  | Shifter Instruction (Baseline)                         | 12 |
| Table 8  | Multiply Instruction (Baseline)                        | 12 |
| Table 9  | Divide Instructions                                    | 13 |
| Table 10 | Load / Store Addressing Mode                           | 13 |
| Table 11 | Load / Store Instruction (Baseline)                    | 13 |
| Table 12 | Load / Store Instruction (Baseline)                    | 14 |
| Table 13 | Load / Store Instruction (Baseline)                    | 15 |
| Table 14 | Load / Store Instruction (Baseline)                    | 15 |
| Table 15 | Load / Store Multiple Word Instruction (Baseline)      | 16 |
| Table 16 | Load / Store Instruction for Atomic Updates (Baseline) | 16 |
| Table 17 | Load / Store Instructions with User-mode Privilege     | 16 |
| Table 18 | Jump Instruction (Baseline)                            | 16 |
| Table 19 | Branch Instruction (Baseline)                          | 17 |
| Table 20 | Branch with link Instruction (Baseline)                | 17 |
| Table 21 | Read/Write System Registers (Baseline)                 | 18 |
| Table 22 | Jump Register with System Register Update (Baseline)   | 18 |
| Table 23 | MMU Instruction (Baseline)                             | 18 |
| Table 24 | Conditional Move (Baseline)                            | 19 |
| Table 25 | Synchronization Instruction (Baseline)                 | 19 |
| Table 26 | Prefetch Instruction (Baseline)                        | 19 |
| Table 27 | NOP Instruction (Baseline)                             | 20 |
| Table 28 | Serialization Instruction (Baseline)                   | 20 |
| Table 29 | Exception Generation Instruction (Baseline)            | 20 |
| Table 30 | Special Return Instruction (Baseline)                  | 20 |
| Table 31 | Cache Control Instruction (Baseline)                   | 20 |
| Table 32 | Miscellaneous Instructions (Baseline)                  | 20 |
| Table 33 | Move Instruction (16-Bit)                              | 23 |



| Table 34 | Add/Sub Instruction with Immediate (16-Bit)        |     |
|----------|----------------------------------------------------|-----|
| Table 35 | Add/Sub Instruction (16-Bit)                       |     |
| Table 36 | Shift Instruction with Immediate (16-Bit)          |     |
| Table 37 | Bit Field Mask Instruction with Immediate (16-Bit) |     |
| Table 38 | Load / Store Instruction (16-Bit)                  |     |
| Table 39 | Load/Store Instruction with Implied FP (16-Bit)    |     |
| Table 40 | Branch and Jump Instruction (16-Bit)               |     |
| Table 41 | Compare and Branch Instruction (16-Bit)            |     |
| Table 42 | Misc. Instruction (16-Bit)                         |     |
| Table 43 | ALU Instructions                                   |     |
| Table 44 | Load/Store Instruction                             |     |
| Table 45 | ALU Instructions                                   |     |
| Table 46 | Multiply and Divide Instructions                   |     |
| Table 47 | Load/Store Instructions                            |     |
| Table 48 | ALU Instruction (Extension)                        |     |
| Table 49 | Performance Extension V2 Instructions              | 33  |
| Table 51 | CCTL SubType Encoding                              | 54  |
| Table 52 | CCTL SubType Definitions                           | 55  |
| Table 53 | Group 0 MFUSR definitions                          | 125 |
| Table 54 | Group 1 MFUSR definitions                          | 125 |
| Table 55 | Group 2 MFUSR definitions                          | 126 |
| Table 56 | MSYNC SubType definitions                          | 132 |
| Table 57 | Group 0 MTUSR definitions                          | 136 |
| Table 58 | Group 1 MTUSR definitions                          | 136 |
| Table 59 | Group 2 MTUSR definitions                          | 137 |
| Table 60 | STANDBY instruction SubType definitions            | 183 |
| Table 61 | TLBOP SubType Definitions                          | 198 |
| Table 62 | N12 Implementation of CCTL instruction             | 338 |



# **List of Figures**

| Figure 1.  | Index type format for Ra of CCTL instruction                      | 58  |
|------------|-------------------------------------------------------------------|-----|
| Figure 2.  | State diagram of the lock flag of a cache line controlled by CCTL | 63  |
| Figure 3.  | Basic BSE operation with $Rb(30) == 0$                            | 226 |
| Figure 4.  | Basic BSE operation with $Rb(30) == 1$                            | 227 |
| Figure 5.  | BSE operation extracting all remaining bits with $Rb(30) == 0$    | 227 |
| Figure 6.  | BSE operation with the "underflow" condition with $Rb(30) == 0$   | 228 |
| Figure 7.  | Basic BSP operation                                               | 232 |
| Figure 8.  | BSP operation with $Rb(30) == 1$                                  | 233 |
| Figure 9.  | BSP operation filling up Rt                                       | 233 |
| Figure 10. | BSP operation with the "overflow" condition.                      | 234 |



# Preface

This preface describe the contents of this manual

### About this manual

This manual provide the information about AndeStar™ instruction set architecture and all information contains in this document is subject to change without notice

### Version of AndeStar<sup>TM</sup> ISA manual

This manual is version 1.3.

### **Contact information**

Please contact Andes Technology Corporation by email at support@andestech.com or on the Internet at www.andestech.com for support.

### **Copyright notice**

© 2007-2009 Andes Technology Corporation. All rights reserved.

AndeStar<sup>™</sup> ISA contains certain confidential information of Andes Technology Corporation. Use of this copyright notice is precautionary and does not imply publication or disclosure. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Andes Technology Corporation.



# Chapter 1 Introduction

This chapter describes overview of AndeStar instruction set and contains the following sections

- 1.1 32/16-Bit ISA on page 2
- 1.2 Data Types on page 2
- 1.3 Registers on page 2
- 1.4 Instruction Classes on page 6
- 1.5 Instruction Encoding on page 6
- 1.6 Miscellaneous on page 8



# 1.1 32/16-Bit ISA

In order to achieve optimal system performance, code density and power efficiency, a set of mixed-length (32/16-bit) instructions has been implemented for Andes ISA.

The Andes 32/16-bit mixed-length ISA has the following features:

- 1. The 32-bit and 16-bit instructions can be freely mixed in a program.
- 2. The 16-bit instructions are a frequently used subset of 32-bit instructions.
- 3. No 32/16-bit mode switching performance penalty when executing mixed 32-bit and 16-bit instructions.
- 4. The 32/16-bit mixed-length ISA is in a big-endian format.
- 5. The ISA is a RISC-style register-based instruction set.
- 6. 5-bit register index in 32-bit instruction format.
- 7. 5/4/3-bit register index in 16-bit instruction format.
- 8. The ISA provides hardware acceleration for a mixed-endian environment.

# **1.2 Data Types**

Andes ISA supports the following data types:

- 1. Integer
  - Bit (1-bit, b)
  - Byte (8-bit, B)
  - Halfword (16-bit, H)
  - Word (32-bit, W)

# 1.3 Registers

As a whole, the Andes instructions can access thirty-two 32-bit General Purpose Registers (GPR) and four 32-bit User Special Registers (USR). The four 32-bit USRs can be combined into two 64-bit registers and used to store 32-bit multiplication results. The

#### Introduction



GPRs will be named from r0 to r31. And the USRs will be named d0.lo, d0.hi, d1.lo, and d1.hi.

For the Andes 16-bit instructions, a register index can be 5-bit, 4-bit, or 3-bit. So the 3-bit and 4-bit register operand field can only access a subset of the thirty-two GPRs. The 3-bit and 4-bit register subsets and its index-number to register-number mappings are listed in the following table (Table 1) along with the software usage convention used by Andes tool chains.

| Register | 32/16-bit (5) | 16-bit (4) | 16-bit (3) | Comments                         |
|----------|---------------|------------|------------|----------------------------------|
| r0       | a0            | h0         | 00         |                                  |
| r1       | a1            | h1         | 01         |                                  |
| r2       | a2            | h2         | o2         |                                  |
| r3       | a3            | h3         | о3         |                                  |
| r4       | a4            | h4         | 04         |                                  |
| r5       | a5            | h5         | 05         | Implied register for beqs38 and  |
|          |               |            |            | bnes38                           |
| r6       | s0            | h6         | 06         | Saved by callee                  |
| r7       | s1            | h7         | о7         | Saved by callee                  |
| r8       | s2            | h8         |            | Saved by callee                  |
| r9       | s3            | h9         |            | Saved by callee                  |
| r10      | s4            | h10        |            | Saved by callee                  |
| r11      | s5            | h11        |            | Saved by callee                  |
| r12      | s6            |            |            | Saved by callee                  |
| r13      | s7            |            |            | Saved by callee                  |
| r14      | s8            |            |            | Saved by callee                  |
| r15      | ta            |            |            | Temporary register for assembler |
|          |               |            |            | Implied register for slt(s i)45, |
|          |               |            |            | b[eq ne]zs8                      |
| r16      | tO            | h12        |            | Saved by caller                  |
| r17      | t1            | h13        |            | Saved by caller                  |
| r18      | t2            | h14        |            | Saved by caller                  |
| r19      | t3            | h15        |            | Saved by caller                  |
| r20      | t4            |            |            | Saved by caller                  |

| Table 1 A | ndes General Pu | rpose Registers |
|-----------|-----------------|-----------------|
|-----------|-----------------|-----------------|



| r21 | t5    |  | Saved by caller                   |
|-----|-------|--|-----------------------------------|
| r22 | t6    |  | Saved by caller                   |
| r23 | t7    |  | Saved by caller                   |
| r24 | t8    |  | Saved by caller                   |
| r25 | t9    |  | Saved by caller                   |
| r26 | p0    |  | Reserved for Privileged-mode use. |
| r27 | p1    |  | Reserved for Privileged-mode use  |
| r28 | s9/fp |  | Frame Point / Saved by callee     |
| r29 | gp    |  | Global Pointer                    |
| r30 | lp    |  | link pointer                      |
| r31 | sp    |  | Stack Pointer                     |

The four USRs can only be accessed by the 32-bit instructions.

|          |               | 1             | 8                               |
|----------|---------------|---------------|---------------------------------|
| Register | 32-bit Instr. | 16-bit Instr. | Comments                        |
| D0       | d0.{hi, lo}   | N/A           | For multiplication and division |
|          |               |               | related instructions (64-bit)   |
| D1       | d1.{hi, lo}   | N/A           | For multiplication and division |
|          |               |               | related instructions (64-bit)   |

Table 2Andes User Special Registers

The Andes ISA assumes an implied Program Counter (PC) which records the address of the currently executing instruction. And the value of this implied PC can not be read but can be changed by using the control flow instructions. To provide hardware acceleration for accessing mixed-endian data in memory, the Andes ISA assumes an implied endian mode bit (PSW.BE) which affects the endian behavior of the load/store instructions. This implied endian mode bit can not be read but can be changed by using the SETEND instruction.

Table 3Andes Status Registers

| Register | 32-bit Instr. | 16-bit Instr. | Comments                        |
|----------|---------------|---------------|---------------------------------|
| PC       | implied       | implied       | Affected by control flow        |
|          |               |               | instructions                    |
| PSW.BE   | implied       | implied       | Affected by SETEND instruction. |



### **1.3.1** Reduced Register configuration option

For small systems which are more sensitive to cost, AndeStar ISA architecture provides a configuration option to reduce the total number of general registers to 16. In this "Reduced Register" configuration, the general registers that can be used by instructions are defined in Table 4. Notice that in the table, r11-r14, and r16-r27 have been removed.

In this Reduced Register configuration, if any of the unimplemented register is used in an instruction, a Reserved Instruction exception will be generated.

| Register | 32/16-bit (5) | 16-bit (4) | 16-bit (3) | Comments                         |
|----------|---------------|------------|------------|----------------------------------|
| r0       | a0            | h0         | 00         |                                  |
| r1       | a1            | h1         | 01         |                                  |
| r2       | a2            | h2         | o2         |                                  |
| r3       | a3            | h3         | о3         |                                  |
| r4       | a4            | h4         | 04         |                                  |
| r5       | a5            | h5         | 05         | Implied register for beqs38 and  |
|          |               |            |            | bnes38                           |
| r6       | s0            | h6         | 06         | Saved by callee                  |
| r7       | s1            | h7         | о7         | Saved by callee                  |
| r8       | s2            | h8         |            | Saved by callee                  |
| r9       | s3            | h9         |            | Saved by callee                  |
| r10      | s4            | h10        |            | Saved by callee                  |
| r15      | ta            |            |            | Temporary register for assembler |
|          |               |            |            | Implied register for slt(s i)45, |
|          |               |            |            | b[eq ne]zs8                      |
| r28      | fp            |            |            | Frame pointer / Saved by callee  |
| r29      | gp            |            |            | Global pointer                   |
| r30      | lp            |            |            | Link pointer                     |
| r31      | sp            |            |            | Stack pointer                    |

Table 4 Registers for Reduced Register Configuration



### **1.4 Instruction Classes**

The Andes ISA can be classified into the following classes:

- 1. Baseline Instructions
  - 32-Bit Baseline
  - 16-Bit Baseline
- 2. Extension Instructions
  - Performance Extension

# **1.5** Instruction Encoding

Bit [31] of a 32-bit instruction and Bit [15] of a 16-bit instruction is used to distinguish between 32-bit and 16-bit instructions:

Bit  $[31] = 0 \Rightarrow 32$ -bit instructions

Bit  $[15] = 1 \Rightarrow 16$ -bit instructions

### **1.5.1 32-Bit Instruction Format**

A typical 32-bit instruction contains three fields:

0 30:25 (6)

24:0 (25)

Bit [31] = 0 => 32-bit ISA Bit [30:25] => OPC [5:0] Bit [24:0] => Operand / Immediate / Sub-op

The 32-bit instruction formats and the meaning of each filed are described below:

| Type-0 Instruction Form |
|-------------------------|
|-------------------------|

| 0  opc  6  sub  1,  imm  24 |
|-----------------------------|
|-----------------------------|

Type-1 Instruction Format

| 0 | opc_6 | rt_5 | imm_20 |        |
|---|-------|------|--------|--------|
| 0 | opc_6 | rt_5 | sub_4  | imm_16 |



### ➢ Type-2 Instruction Format

| 0 | opc_6 | rt_5 | ra_5 | imm_15          |
|---|-------|------|------|-----------------|
| 0 | opc_6 | rt_5 | ra_5 | {sub_1, imm_14} |

### Type-3 Instruction Format

| 0 | opc_6 | rt_5 | ra_5 | rb_5  | sub_10 |
|---|-------|------|------|-------|--------|
| 0 | opc_6 | rt_5 | ra_5 | imm_5 | sub_10 |

### ➢ Type-4 Instruction Format

| 0 | opc_6 | rt_5 | ra_5 | rb_5   | rd_5   | sub_5 |
|---|-------|------|------|--------|--------|-------|
| 0 | opc_6 | rt_5 | ra_5 | imm1_5 | imm2_5 | sub_5 |

- opc\_6: 6-bit opcode
- rt\_5: target register in 5-bit index register set
- ra\_5: source register in 5-bit index register set
- rb\_5: source register in 5-bit index register set
- rd\_5: destination register in 5-bit index register set
- sub\_10: 10-bit sub-opcode
- sub\_5: 5-bit sub-opcode
- sub\_4: 4-bit sub-opcode
- sub\_2: 2-bit sub-opcode
- sub\_1: 1-bit sub-opcode
- imm\_24: 24-bit immediate value, for unconditional jump instructions (J, JAL). The immediate value is used as the lower 24-bit offset of same 32MB memory block (new PC[31:0] = {current PC[31:25], imm\_24, 1'b0}
- imm\_20: 20-bit immediate value. Sign-extended to 32-bit for MOVI operations.
- imm\_16: signed PC relative address displacement for branch instructions.
- imm\_15: 15-bit immediate value. Zero extended to 32-bit for unsigned operations, while sign extended to 32-bit for signed operations.
- imm\_14: signed PC relative address displacement for branch instructions.
- imm\_5, imm1\_5, imm2\_5: 5-bit unsigned count value or index value

For the detailed decoding information, please refer to the individual chapters.



### **1.5.2 16-Bit Instruction Format**

Please see the "16-Bit Baseline Instructions" chapter for more information.

### 1.6 Miscellaneous

- Instruction addressing is Halfword aligned
- Integer branch instructions will NOT use conditional codes (use GPR instead)
- Branch/Jump types
  - Branch on Registers
  - Branch destination
    - PC + Immediate Offset
  - Jump destination
    - PC + Immediate Offset
    - Register
  - Branch/Jump and link
- > The immediate values of load/store word/halfword are shifted.
  - lw rt, imm(ra), addr=ra+(imm << 2)
  - lh rt, imm(ra), addr=ra+(imm << 1)
- Register Remapping for 16-bit instructions
  - 32 registers (32-bit instructions) -> 32/16/8 registers (16-bit instructions)



# Chapter 2 32-Bit Baseline Instruction

This chapter describes 32bit baseline instructions, including Data-processing instructions, Load and Store instructions, Jump and Branch Instructions, Privileged Resource Access Instructions and Miscellaneous Instructions and contains the following sections

- 2.1 Data-processing instructions on page 11
- 2.2 Load and Store Instructions on page 13
- 2.3 Jump and Branch Instructions on page 16
- 2.4 Privileged Resource Access Instructions on page 18
- 2.5 Miscellaneous Instructions on page 19



The 32-bit baseline instruction set can be cataloged as follows:

- 1. Data-processing: (with Register or Immediate)
  - Basic ALU : ADD, SUB, AND, NOR, OR, XOR...
  - Shifter : ROTR, SLL, SRL, SRA...
  - Compare : SLT, SLTS
  - Set constant : SETHI, MOVI
  - Multiply (& Add) : MUL, MULT64, MADD64...
  - Divide : DIV, DIVS
- 2. Load and Store:
  - 5 types of load element:
    - Word
    - Unsigned Halfword and Signed Halfword
    - Unsigned Byte and Signed Byte
  - 3 types of store element:
    - Word
    - Halfword
    - Byte
  - 2 types of address calculation:
    - Register + Immediate
    - $\blacksquare \quad \text{Register} + (\text{Register} << \text{shift})$
  - 2 types of addressing mode:
    - Regular (After-Increment) and no base register upate
    - Before-Increment and base register update
  - Additional features:
    - Load/Store multiple word : LMW, SMW
    - Load/Store word for atomic operation : LLW, SCW
    - Load/Store word with user mode privilege : LWUP, SWUP

### 3. Jump and Branch:

| • Unconditional jump | : J, JR |
|----------------------|---------|
|----------------------|---------|

- Unconditional function call jump : JAL, JRAL
- Unconditional function return : RET
- Conditional branch : BEQ, BNE, BEQZ, BNEZ
- Conditional function call branch : BGEZAL, BLTZAL



4. Privileged Resource Access:

- System control : MFSR, MTSR
- TLB management : TLBOP
- 5. Miscellaneous:
  - Conditional move : CMOVZ, CMOVN
  - Synchronization : MSYNC, ISYNC
  - Prefetch : DPREF, DPREFI
  - No operation : NOP
  - Serialization : ISB, DSB
  - Exception generation : SYSCALL, BREAK...
  - Special return : IRET, RET.TOFF, RET.ITOFF
  - Cache control : CCTL
  - Miscellaneous : SETEND, SETGIE, STANDBY

# 2.1 Data-processing Instructions

|       | Mnemonic         | Instruction                | Operation                                   |
|-------|------------------|----------------------------|---------------------------------------------|
| ADDI  | rt5, ra5, imm15s | Add Immediate              | rt5 = ra5 + SE(imm15s)                      |
| SUBRI | rt5, ra5, imm15s | Subtract Reverse           | rt5 = SE(imm15s) - ra5                      |
|       |                  | Immediate                  |                                             |
| ANDI  | rt5, ra5, imm15u | And Immediate              | rt5 = ra5 && ZE(imm15u)                     |
| ORI   | rt5, ra5, imm15u | Or Immediate               | $rt5 = ra5 \parallel ZE(imm15u)$            |
| XORI  | rt5, ra5, imm15u | Exclusive Or Immediate     | $rt5 = ra5 \wedge ZE(imm15u)$               |
| SLTI  | rt5, ra5, imm15s | Set on Less Than Immediate | rt5 = (ra5 (unsigned) < SE(imm15s)) ? 1 : 0 |
| SLTSI | rt5, ra5, imm15s | Set on Less Than Signed    | rt5 = (ra5 (signed) < SE(imm15s)) ? 1 : 0   |
|       |                  | Immediate                  |                                             |
| MOVI  | rt5, imm20s      | Move Immediate             | rt5 = SE(imm20s)                            |
| SETHI | rt5, imm20u      | Set High Immediate         | $rt5 = {imm20u, 12'b0}$                     |

 Table 5
 ALU Instruction with Immediate (Baseline)

| Table 6 | ALU | Instruction | (Baseline) |
|---------|-----|-------------|------------|
|---------|-----|-------------|------------|

|     | Mnemonic      | Instruction | Operation       |
|-----|---------------|-------------|-----------------|
| ADD | rt5, ra5, rb5 | Add         | rt5 = ra5 + rb5 |

#### **32-Bit Baseline Instructions**



| SUB       | rt5, ra5, rb5  | Subtract                 | rt5 = ra5 - rb5                          |
|-----------|----------------|--------------------------|------------------------------------------|
| AND       | rt5, ra5, rb5  | And                      | rt5 = ra5 && rb5                         |
| NOR       | rt5, ra5, rb5  | Nor                      | $rt5 = \sim (ra5 \parallel rb5)$         |
| OR        | rt5, ra5, rb5  | Or                       | $rt5 = ra5 \parallel rb5$                |
| XOR       | rt5, ra5, rb5  | Exclusive Or             | $rt5 = ra5 \wedge rb5$                   |
| SLT       | rt5, ra5, rb5  | Set on Less Than         | rt5 = (ra5 (unsigned) < rb5) ? 1 : 0     |
| SLTS      | rt5, ra5, rb5  | Set on Less Than Signed  | rt5 = (ra5 (signed) < rb5) ? 1 : 0       |
| SVA       | rt5, ra5, rb5  | Set on Overflow Add      | rt5 = ((ra5 + rb5)  overflow)? 1:0       |
| SVS       | rt5, ra5, rb5  | Set on Overflow Subtract | rt5 = ((ra5 - rb5) overflow))? 1 : 0     |
| SEB       | rt5, ra5       | Sign Extend Byte         | rt5 = SE(ra5[7:0])                       |
| SEH       | rt5, ra5       | Sign Extend Halfword     | rt5 = SE(ra5[15:0])                      |
| ZEB       | rt5, ra5       | Zero Extend Byte         | rt5 = ZE(ra5[7:0])                       |
| (alias of | ANDI rt5, ra5, |                          |                                          |
| 0xFF)     |                |                          |                                          |
| ZEH       | rt5, ra5       | Zero Extend Halfword     | rt5 = ZE(ra5[15:0])                      |
| WSBH      | rt5, ra5       | Word Swap Byte within    | rt5 = {ra5[23:16], ra5[31:24], ra5[7:0], |
|           |                | Halfword                 | ra5[15:8]}                               |

| Table 7 | Shifter | Instruction | (Baseline) |
|---------|---------|-------------|------------|
|---------|---------|-------------|------------|

|       | Mnemonic        | Instruction            | Operation                    |
|-------|-----------------|------------------------|------------------------------|
| SLLI  | rt5, ra5, imm5u | Shift Left Logical     | rt5 = ra5 << imm5u           |
|       |                 | Immediate              |                              |
| SRLI  | rt5, ra5, imm5u | Shift Right Logical    | rt5 = ra5 (logic)>> imm5u    |
|       |                 | Immediate              |                              |
| SRAI  | rt5, ra5, imm5u | Shift Right Arithmetic | rt5 = ra5 (arith)>> imm5u    |
|       |                 | Immediate              |                              |
| ROTRI | rt5, ra5, imm5u | Rotate Right Immediate | rt5 = ra5 >>  imm5u          |
| SLL   | rt5, ra5, rb5   | Shift Left Logical     | rt5 = ra5 << rb5(4,0)        |
| SRL   | rt5, ra5, rb5   | Shift Right Logical    | rt5 = ra5 (logic)>> rb5(4,0) |
| SRA   | rt5, ra5, rb5   | Shift Right Arithmetic | rt5 = ra5 (arith)>> rb5(4,0) |
| ROTR  | rt5, ra5, rb5   | Rotate Right           | rt5 = ra5 >>  rb5(4,0)       |

| Table 8 | Multiply | Instruction | (Baseline) |
|---------|----------|-------------|------------|
| 14010 0 | manupij  | mounderen   | (Dasenne)  |

|     | Mnemonic      | Instruction               | Operation       |
|-----|---------------|---------------------------|-----------------|
| MUL | rt5, ra5, rb5 | Multiply Word to Register | rt5 = ra5 * rb5 |

#### **32-Bit Baseline Instructions**



| MULTS64 d1, ra5, rb5 | Multiply Word Signed            | d1 = ra5 (signed) * rb5       |
|----------------------|---------------------------------|-------------------------------|
| MULT64 d1, ra5, rb5  | Multiply Word                   | d1 = ra5 (unsigned)* rb5      |
| MADDS64 d1, ra5, rb5 | Multiply and Add Signed         | d1 = d1 + ra5 (signed)* rb5   |
| MADD64 d1, ra5, rb5  | Multiply and Add                | d1 = d1 + ra5 (unsigned)* rb5 |
| MSUBS64 d1, ra5, rb5 | Multiply and Subtract Signed    | d1 = d1 - ra5 (signed)* rb5   |
| MSUB64 d1, ra5, rb5  | Multiply and Subtract           | d1 = d1 - ra5 (unsigned)* rb5 |
| MULT32 d1, ra5, rb5  | Multiply Word                   | d1.LO = ra5 * rb5             |
| MADD32 d1, ra5, rb5  | Multiply and Add                | d1.LO = d1.LO + ra5 * rb5     |
| MSUB32 d1, ra5, rb5  | Multiply and Subtract           | d1.LO = d1.LO - ra5 * rb5     |
| MFUSR rt5, USR       | Move From User Special Register | rt5 = USReg[USR]              |
| MTUSR rt5, USR       | Move To User Special Register   | USReg[USR] = rt5              |

Table 9Divide Instructions

|      | Mnemonic     | Instruction             | Operation                        |  |
|------|--------------|-------------------------|----------------------------------|--|
| DIV  | Dt, ra5, rb5 | Unsigned Integer Divide | Dt.L = ra5 (unsigned) / rb5;     |  |
|      |              |                         | Dt.H = ra5 (unsigned) mod / rb5; |  |
| DIVS | Dt, ra5, rb5 | Signed Integer Divide   | Dt.L = ra5 (signed) / rb5;       |  |
|      |              |                         | Dt.H = ra5 (signed) / rb5;       |  |

### 2.2 Load and Store Instructions

| Mode | Operand Type              | Index Left Shift (0-3 bits) | Before Increment with Base Update |
|------|---------------------------|-----------------------------|-----------------------------------|
| 1    | Base Register + Immediate | No                          | No                                |
| 2    | Base Register + Immediate | No                          | Yes                               |
| 3    | Base Register + Register  | Yes                         | No                                |
| 4    | Base Register + Register  | Yes                         | Yes                               |

 Table 10
 Load / Store Addressing Mode

| Table 11 | Load / Store | Instruction | (Baseline) |
|----------|--------------|-------------|------------|
|----------|--------------|-------------|------------|

|     | Mnemonic               | Instruction         | Operation                          |
|-----|------------------------|---------------------|------------------------------------|
| LWI | rt5, [ra5 + (imm15s << | Load Word Immediate | address = $ra5 + SE(imm15s \ll 2)$ |



| 2)]    |                        |                          | rt5 = Word-memory(address)           |
|--------|------------------------|--------------------------|--------------------------------------|
| LHI    | rt5, [ra5 + (imm15s << | Load Halfword Immediate  | address = ra5 + SE(imm15s << 1)      |
| 1)]    |                        |                          | rt5 = ZE(Halfword-memory(address))   |
| LHSI   | rt5, [ra5 + (imm15s << | Load Halfword Signed     | address = ra5 + SE(imm15s << 1)      |
| 1)]    |                        | Immediate                | rt5 = SE(Halfword-memory(address))   |
| LBI    | rt5, [ra5 + imm15s]    | Load Byte Immediate      | address = ra5 + SE(imm15s)           |
|        |                        |                          | rt5 = ZE(Byte-memory(address))       |
| LBSI   | rt5, [ra5 + imm15s]    | Load Byte Signed         | address = ra5 + SE(imm15s)           |
|        |                        | Immediate                | rt5 = SE(Byte-memory(address))       |
| SWI    | rt5, [ra5 + (imm15s    | Store Word Immediate     | address = ra5 + SE(imm15s << 2)      |
| << 2)] |                        |                          | Word-memory(address) = rt5           |
| SHI    | rt5, [ra5 + (imm15s << | Store Halfword Immediate | address = ra5 + SE(imm15s << 1)      |
| 1)]    |                        |                          | Halfword-memory(address) = rt5[15:0] |
| SBI    | rt5, [ra5 + imm15s]    | Store Byte Immediate     | address = ra5 + SE(imm15s)           |
|        |                        |                          | Byte-memory(address) = rt5[7:0]      |

|         | Mnemonic      | Instruction               | Operation                        |
|---------|---------------|---------------------------|----------------------------------|
| LWI.bi  | rt5, [ra5],   | Load Word Immediate with  | rt5 = Word-memory(ra5)           |
|         | (imm15s << 2) | Post Increment            | $ra5 = ra5 + SE(imm15s \ll 2)$   |
| LHI.bi  | rt5, [ra5],   | Load Halfword Immediate   | rt5 = ZE(Halfword-memory(ra5))   |
|         | (imm15s << 1) | with Post Increment       | ra5 = ra5 + SE(imm15s << 1)      |
| LHSI.bi | rt5, [ra5],   | Load Halfword Signed      | rt5 = SE(Halfword-memory(ra5))   |
|         | (imm15s << 1) | Immediate with Post       | $ra5 = ra5 + SE(imm15s \ll 1)$   |
|         |               | Increment                 |                                  |
| LBI.bi  | rt5, [ra5],   | Load Byte Immediate with  | rt5 = ZE(Byte-memory(ra5))       |
|         | imm15s        | Post Increment            | ra5 = ra5 + SE(imm15s)           |
| LBSI.bi | rt5, [ra5],   | Load Byte Signed          | rt5 = SE(Byte-memory(ra5))       |
|         | imm15s        | Immediate with Post       | ra5 = ra5 + SE(imm15s)           |
|         |               | Increment                 |                                  |
| SWI.bi  | rt5, [ra5],   | Store Word Immediate with | Word-memory(ra5) = rt5           |
|         | (imm15s << 2) | Post Increment            | ra5 = ra5 + SE(imm15s << 2)      |
| SHI.bi  | rt5, [ra5],   | Store Halfword Immediate  | Halfword-memory(ra5) = rt5[15:0] |
|         | (imm15s << 1) | with Post Increment       | $ra5 = ra5 + SE(imm15s \ll 1)$   |
| SBI.bi  | rt5, [ra5],   | Store Byte Immediate with | Byte-memory(ra5) = $rt5[7:0]$    |



| imm15s | Post Increment | ra5 = ra5 + SE(imm15s) |
|--------|----------------|------------------------|
|        |                |                        |

| Mnemonic |                          | Instruction          | Operation                            |
|----------|--------------------------|----------------------|--------------------------------------|
| LW       | rt5, [ra5 + (rb5 << sv)] | Load Word            | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | rt5 = Word-memory(address)           |
| LH       | rt5, [ra5 + (rb5 << sv)] | Load Halfword        | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | rt5 = ZE(Halfword-memory(address))   |
| LHS      | rt5, [ra5 + (rb5 << sv)] | Load Halfword Signed | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | rt5 = SE(Halfword-memory(address))   |
| LB       | rt5, [ra5 + (rb5 << sv)] | Load Byte            | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | rt5 = ZE(Byte-memory(address))       |
| LBS      | rt5, [ra5 + (rb5 << sv)] | Load Byte Signed     | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | rt5 = SE(Byte-memory(address))       |
| SW       | rt5, [ra5 + (rb5 << sv)] | Store Word           | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | Word-memory(address) = rt5           |
| SH       | rt5, [ra5 + (rb5 << sv)] | Store Halfword       | $address = ra5 + (rb5 \ll sv)$       |
|          |                          |                      | Halfword-memory(address) = rt5[15:0] |
| SB       | rt5, [ra5 + (rb5 << sv)] | Store Byte           | address = ra5 + (rb5 << sv)          |
|          |                          |                      | Byte-memory(address) = rt5[7:0]      |

### Table 13 Load / Store Instruction (Baseline)

 Table 14
 Load / Store Instruction (Baseline)

|        | Mnemonic                                                                                    | Instruction                | Operation                      |
|--------|---------------------------------------------------------------------------------------------|----------------------------|--------------------------------|
| LW.bi  | rt5, [ra5],                                                                                 | Load Word with Post        | rt5 = Word-memory(ra5)         |
|        | rb5< <sv< td=""><td>Increment</td><td><math>ra5 = ra5 + (rb5 \ll sv)</math></td></sv<>      | Increment                  | $ra5 = ra5 + (rb5 \ll sv)$     |
| LH.bi  | rt5, [ra5],                                                                                 | Load Halfword with Post    | rt5 = ZE(Halfword-memory(ra5)) |
|        | rb5< <sv< td=""><td>Increment</td><td>ra5 = ra5 + (rb5 &lt;&lt; sv)</td></sv<>              | Increment                  | ra5 = ra5 + (rb5 << sv)        |
| LHS.bi | rt5, [ra5],                                                                                 | Load Halfword Signed with  | rt5 = SE(Halfword-memory(ra5)) |
|        | rb5< <sv< td=""><td>Post Increment</td><td><math>ra5 = ra5 + (rb5 \ll sv)</math></td></sv<> | Post Increment             | $ra5 = ra5 + (rb5 \ll sv)$     |
| LB.bi  | rt5, [ra5],                                                                                 | Load Byte with Post        | rt5 = ZE(Byte-memory(ra5))     |
|        | rb5< <sv< td=""><td>Increment</td><td>ra5 = ra5 + (rb5 &lt;&lt; sv)</td></sv<>              | Increment                  | ra5 = ra5 + (rb5 << sv)        |
| LBS.bi | rt5, [ra5],                                                                                 | Load Byte Signed with Post | rt5 = SE(Byte-memory(ra5))     |
|        | rb5< <sv< td=""><td>Increment</td><td><math>ra5 = ra5 + (rb5 \ll sv)</math></td></sv<>      | Increment                  | $ra5 = ra5 + (rb5 \ll sv)$     |
| SW.bi  | rt5, [ra5],                                                                                 | Store Word with Post       | Word-memory(ra5) = rt5         |
|        | rb5< <sv< td=""><td>Increment</td><td><math>ra5 = ra5 + (rb5 \ll sv)</math></td></sv<>      | Increment                  | $ra5 = ra5 + (rb5 \ll sv)$     |



| SH.bi | rt5, [ra5],                                                                    | Store Halfword with Post | Halfword-memory( $ra5$ ) = $rt5[15:0]$ |
|-------|--------------------------------------------------------------------------------|--------------------------|----------------------------------------|
|       | rb5< <sv< td=""><td>Increment</td><td>ra5 = ra5 + (rb5 &lt;&lt; sv)</td></sv<> | Increment                | ra5 = ra5 + (rb5 << sv)                |
| SB.bi | rt5, [ra5],                                                                    | Store Byte with Post     | Byte-memory( $ra5$ ) = $rt5[7:0]$      |
|       | rb5< <sv< td=""><td>Increment</td><td>ra5 = ra5 + (rb5 &lt;&lt; sv)</td></sv<> | Increment                | ra5 = ra5 + (rb5 << sv)                |

### Table 15 Load / Store Multiple Word Instruction (Baseline)

| Mnemonic                                               | Instruction                          | Operation                 |
|--------------------------------------------------------|--------------------------------------|---------------------------|
| LMW. $\{b a\} \{i d\} \{m?\} Rb5$ ,                    | Load Multiple Word (before/after;    | See page 111 for details. |
| [Ra5], Re5, Enable4                                    | in/decrement; update/no-update base) |                           |
| SMW. $\{b a\} \{i d\} \{m?\} Rb5$ ,                    | Store Multiple Word (before/after;   | See page 175 for details. |
| [Ra5], Re5, Enable4 in/decrement; update/no-update bas |                                      |                           |

### Table 16 Load / Store Instruction for Atomic Updates (Baseline)

| Mnemonic                     | Instruction          | Operation |
|------------------------------|----------------------|-----------|
| LLW rt5, [ra5 + (rb5 << sv)] | Load Locked Word     |           |
| SCW rt5, [ra5 + (rb5 << sv)] | Store Condition Word |           |

Table 17 Load / Store Instructions with User-mode Privilege

| Mnemonic                      | Instruction                                          | Operation                                 |
|-------------------------------|------------------------------------------------------|-------------------------------------------|
| LWUP rt5, [ra5 + (rb5 << sv)] | Load Word with Equavalent to LW instruction but with |                                           |
|                               | User-mode Privilege                                  | user mode privilege address translation.  |
|                               | Translation                                          | See page 119 for details.                 |
| SWUP rt5, [ra5 + (rb5 << sv)] | Store Word with                                      | Equavalent to SW instruction but with the |
|                               | User-mode Privilege                                  | user mode privilege address translation.  |
|                               | Translation                                          | See page 193 for details.                 |

# **2.3 Jump and Branch Instructions**

|   | Mnemonic | Instruction | Operation                    |
|---|----------|-------------|------------------------------|
| J | imm24s   | Jump        | $PC = PC + SE(imm24s \ll 1)$ |

| Table 18 | Jump | Instruction | (Baseline) |
|----------|------|-------------|------------|
|----------|------|-------------|------------|



| JAL imm24s    | Jump and Link          | LP = next sequential PC (PC + 4); |
|---------------|------------------------|-----------------------------------|
|               |                        | PC = PC + SE(imm24s << 1)         |
| JR rb5        | Jump Register          | PC = rb5                          |
| RET rb5       | Return from Register   | PC = rb5                          |
| JRAL rb5      | Jump Register and Link | jaddr = rb5;                      |
| JRAL rt5, rb5 |                        | LP = PC + 4; or $rt5 = PC + 4$ ;  |
|               |                        | PC = jaddr;                       |

| Mnemonic             | Instruction               | Operation                                  |
|----------------------|---------------------------|--------------------------------------------|
| BEQ rt5, ra5, imm14s | Branch on Equal ( 2       | PC = (rt5 == ra5)? (PC + SE(imm14s <<      |
|                      | Register)                 | 1)) : (PC + 4)                             |
| BNE rt5, ra5, imm14s | Branch on Not Equal ( 2   | PC = (rt5 =! ra5)? (PC + SE(imm14s <<      |
|                      | Register)                 | 1)) : (PC + 4)                             |
| BEQZ rt5, imm16s     | Branch on Equal Zero      | PC = (rt5==0)? (PC + SE(imm16s << 1)) :    |
|                      |                           | (PC + 4)                                   |
| BNEZ rt5, imm16s     | Branch on Not Equal Zero  | PC = (rt5 =! 0)? (PC + SE(imm16s << 1)) :  |
|                      |                           | (PC + 4)                                   |
| BGEZ rt5, imm16s     | Branch on Greater than or | PC = (rt5 (signed)>= 0)? (PC +             |
|                      | Equal to Zero             | SE(imm16s << 1)) : (PC + 4)                |
| BLTZ rt5, imm16s     | Branch on Less than Zero  | PC = (rt5 (signed)< 0)? (PC +              |
|                      |                           | sign-ext(imm16s << 1)) : (PC + 4)          |
| BGTZ rt5, imm16s     | Branch on Greater than    | PC = (rt5 (signed)> 0)? (PC + SE(imm16s    |
|                      | Zero                      | << 1)) : (PC + 4)                          |
| BLEZ rt5, imm16s     | Branch on Less than or    | $PC = (rt5 \text{ (signed)} \le 0)? (PC +$ |
|                      | Equal to Zero             | SE(imm16s << 1)) : (PC + 4)                |

| Table 19 | Branch | Instruction | (Baseline) |
|----------|--------|-------------|------------|
|          |        |             |            |

| Table 20 | Branch with | link Instruction | (Baseline) |
|----------|-------------|------------------|------------|
|----------|-------------|------------------|------------|

| Mnemonic           | Instruction               | Operation                                  |
|--------------------|---------------------------|--------------------------------------------|
|                    |                           |                                            |
| BGEZAL rt5, imm16s | Branch on Greater than or | LP = next sequential PC (PC + 4);          |
|                    | Equal to Zero and Link    | $PC = (rt5 (signed) \ge 0)? (PC +$         |
|                    |                           | SE(imm16s << 1)), (PC + 4);                |
| BLTZAL rt5, imm16s | Branch on Less than Zero  | LP = next sequential PC (PC + 4);          |
|                    | and Link                  | PC = (rt5 (signed) < 0)? (PC + SE(imm16s)) |



# 2.4 Privilege Resource Access Instructions

|      | Tuble 21 Roud, while System Registers (Dusenne) |                         |                 |  |
|------|-------------------------------------------------|-------------------------|-----------------|--|
|      | Mnemonic                                        | Instruction             | Operation       |  |
| MFSR | rt5, SRIDX                                      | Move from System        | rt5 = SR[SRIDX] |  |
|      |                                                 | Register                |                 |  |
| MTSR | rt5, SRIDX                                      | Move to System Register | SR[SRIDX] = rt5 |  |

 Table 21
 Read/Write System Registers (Baseline)

 Table 22
 Jump Register with System Register Update (Baseline)

| Mnemonic           | Instruction                   | Operation                   |
|--------------------|-------------------------------|-----------------------------|
| JR.ITOFF rb5       | Jump Register and Instruction | PC = rb5;                   |
|                    | Translation OFF               | PSW.IT = 0;                 |
| JR.TOFF rb5        | Jump Register and Translation | PC = rb5;                   |
|                    | OFF                           | PSW.IT = 0, $PSW.DT = 0$ ;  |
| JRAL.ITON rb5      | Jump Register and Link and    | jaddr = rb5;                |
| JRAL.ITON rt5, rb5 | Instruction Translation ON    | LP = PC+4 or $rt5 = PC+4$ ; |
|                    |                               | PC = jaddr;                 |
|                    |                               | PSW.IT = 1;                 |
| JRAL.TON rb5       | Jump Register and Link and    | jaddr = rb5;                |
| JRAL.TON rt5, rb5  | Translation ON                | LP = PC+4 or $rt5 = PC+4$ ; |
|                    |                               | PC = jaddr;                 |
|                    |                               | PSW.IT = 1, PSW.DT = 1;     |

| Table 23 | MMU | Instruction | (Baseline) |
|----------|-----|-------------|------------|
|----------|-----|-------------|------------|

|       | Mnemonic          | Instruction                | Operation |
|-------|-------------------|----------------------------|-----------|
| TLBOP | Ra,               | Read targeted TLB entry    |           |
|       | TargetRead (TRD)  |                            |           |
| TLBOP | Ra,               | Write targeted TLB entry   |           |
|       | TargetWrite (TWR) |                            |           |
| TLBOP | Ra,               | Write PTE into a TLB entry |           |



|         | RWrite (RWR)     |                              |                                          |
|---------|------------------|------------------------------|------------------------------------------|
| TLBOP   | Ra,              | Write PTE into a TLB entry   |                                          |
| R       | WriteLock (RWLK) | and lock                     |                                          |
| TLBOP   | Ra,              | Unlock a TLB entry           |                                          |
|         | Unlock (UNLK)    |                              |                                          |
| TLBOP   | Rt, Ra,          | Probe TLB entry              |                                          |
|         | Probe (PB)       |                              |                                          |
| TLBOP   | Ra,              | Invalidate TLB entries       | Invalidate the TLB entry containing VA   |
|         | Invalidate (INV) |                              | stored in Rx.                            |
| TLBOP   | FlushAll (FLUA)  | Flush all TLB entries except |                                          |
|         |                  | locked entries               |                                          |
| LD_VLP7 | ſ                | Load VLPT page table         | Load VLPT page table which always goes   |
|         |                  | (optional instruction)       | through data TLB translation. On TLB     |
|         |                  |                              | miss, generate Double TLB miss exception |

# 2.5 Miscellaneous Instructions

| Table 24 ( | Conditional | Move ( | Baseline) | ) |
|------------|-------------|--------|-----------|---|
|------------|-------------|--------|-----------|---|

| Mnemonic            | Instruction              | Operation               |
|---------------------|--------------------------|-------------------------|
| CMOVZ rt5, ra5, rb5 | Conditional Move on Zero | rt5 = ra5 if (rb5 == 0) |
| CMOVN rt5, ra5, rb5 | Conditional Move on Not  | rt5 = ra5 if (rb5 != 0) |
|                     | Zero                     |                         |

Table 25Synchronization Instruction (Baseline)

| Mnemonic | Instruction             | Operation                              |
|----------|-------------------------|----------------------------------------|
| MSYNC    | Memory Synchronize      | Synchronize Shared Memory              |
| ISYNC    | Instruction Synchronize | Synchronize Caches to Make Instruction |
|          |                         | Stream Writes Effective                |

| Table 26 Prefetch Instruction (H | Baseline) |
|----------------------------------|-----------|
|----------------------------------|-----------|

|        | Mnemonic            | Instruction             | Operation |
|--------|---------------------|-------------------------|-----------|
| DPREFI | [ra5 + imm15s]      | Data Prefetch Immediate |           |
| DPREF  | [ra5 + (rb5 << si)] | Data Prefetch           |           |



| Mnemonic                  | Instruction  | Operation    |
|---------------------------|--------------|--------------|
| NOP                       | No Operation | No Operation |
| (alias of SRLI R0, R0, 0) |              |              |

### Table 27NOP Instruction (Baseline)

### Table 28Serialization Instruction (Baseline)

| Mnemonic | Instruction                | Operation |
|----------|----------------------------|-----------|
| DSB      | Data Serialization Barrier |           |
| ISB      | Instruction Serialization  |           |
|          | Barrier                    |           |

### Table 29 Exception Generation Instruction (Baseline)

| Mnemonic | Instruction            | Operation |
|----------|------------------------|-----------|
| BREAK    | Breakpoint             |           |
| SYSCALL  | System Call            |           |
| TRAP     | Trap Always            |           |
| TEQZ     | Trap on Equal Zero     |           |
| TNEZ     | Trap on Not Equal Zero |           |

### Table 30 Special Return Instruction (Baseline)

| Mnemonio   | ic  | Instruction                     | Operation                              |
|------------|-----|---------------------------------|----------------------------------------|
| IRET       |     | Interruption Return             | Return from Interruption (exception or |
|            |     |                                 | interrupt). Please see page 76.        |
| RET.ITOFF  | Rb5 | Return and turn off instruction | PC = Rb5, $PSW.IT = 0$                 |
|            |     | address translation             | (page 148)                             |
| RET.TOFF R | Rb5 | Return and turn off address     | PC = Rb5, $PSW.IT = 0$ , $PSW.DT = 0$  |
|            |     | translation (instruction/data)  | (page 148)                             |

### Table 31 Cache Control Instruction (Baseline)

| Mnemonic | Instruction   | Operation                              |
|----------|---------------|----------------------------------------|
| CCTL     | Cache Control | Read, write, and control cache states. |
|          |               | Please see page 54.                    |

### Table 32Miscellaneous Instructions (Baseline)

| Mnemonic | Instruction | Operation |
|----------|-------------|-----------|

#### **32-Bit Baseline Instructions**



| SETEND.B | Atomic set or clear of  | PSW.BE = 1; // SETEND.B                   |
|----------|-------------------------|-------------------------------------------|
| SETEND.L | PSW.BE bit              | PSW.BE = 0; // SETEND.L                   |
| SETGIE.E | Atomic set or clear of  | PSW.GIE = 1; // SETGIE.E                  |
| SETGIE.D | PSW.GIE bit             | PSW.GIE = 0; // SETGIE.D                  |
| STANDBY  | Wait for External Event | Enter standby state and wait for external |
|          |                         | event. Please see page 183.               |



# Chapter 3 16-Bit Baseline Instruction

This chapter describes 16bit baseline instructions and contains the following sections

3.1 32-bit instruction Mapping on page 23



The Andes 16-bit instruction set is a subset of the 32-bit instruction set. So every 16-bit instruction can be properly mapped onto a corresponding 32-bit instruction. The mappings are listed in the next section.

# 3.1 32-bit Instruction Mapping

| Mnemonic          | Instruction    | 32-Bit Operation     |  |
|-------------------|----------------|----------------------|--|
| MOVI55 rt5, imm5s | Move Immediate | MOVI rt5, SE(imm5s)  |  |
| MOV55 rt5, ra5    | Move           | ADDI/ORI rt5, ra5, 0 |  |

### Table 33Move Instruction (16-Bit)

Table 34 Add/Sub Instruction with Immediate (16-Bit)

|         | Mnemonic        | Instruction             | 32-Bit Operation |                      |
|---------|-----------------|-------------------------|------------------|----------------------|
| ADDI45  | rt4, imm5u      | Add Word Immediate      | ADDI             | rt5, rt5, ZE(imm5u)  |
| ADDI333 | rt3, ra3, imm3u | Add Word Immediate      | ADDI             | rt5, ra5, ZE(imm3u)  |
| SUBI45  | rt4, imm5u      | Subtract Word Immediate | ADDI             | rt5, rt5, NEG(imm5u) |
| SUBI333 | rt3, ra3, imm3u | Subtract Word Immediate | ADDI             | rt5, ra5, NEG(imm3u) |

Table 35 Add/Sub Instruction (16-Bit)

| Mnemonic             | Instruction   |     | 32-Bit Operation |
|----------------------|---------------|-----|------------------|
| ADD45 rt4, rb5       | Add Word      | ADD | rt5, rt5, rb5    |
| ADD333 rt3, ra3, rb3 | Add Word      | ADD | rt5, ra5, rb5    |
| SUB45 rt4, rb5       | Subtract Word | SUB | rt5, rt5, rb5    |
| SUB333 rt3, ra3, rb3 | Subtract Word | SUB | rt5, ra5, rb5    |

Table 36Shift Instruction with Immediate (16-Bit)

|         | Mnemonic        | Instruction                   |      | 32-Bit Operation    |
|---------|-----------------|-------------------------------|------|---------------------|
| SRAI45  | rt4, imm5u      | Shift Right Arithmetic        | SRAI | rt5, rt5, imm5u     |
|         |                 | Immediate                     |      |                     |
| SRLI45  | rt4, imm5u      | Shift Right Logical Immediate | SRLI | rt5, rt5, imm5u     |
| SLLI333 | rt3, ra3, imm3u | Shift Left Logical Immediate  | SLLI | rt5, ra5, ZE(imm3u) |

 Table 37
 Bit Field Mask Instruction with Immediate (16-Bit)

| Mnemonic Instruction 32-Bit Operation |
|---------------------------------------|
|---------------------------------------|


| BFMI333 rt3, ra3, imm3u | Bit Field Mask Immediate  |                      |
|-------------------------|---------------------------|----------------------|
|                         |                           |                      |
| ZEB33 rt3, ra3          | Zero Extend Byte          | ZEB rt5, ra5         |
| (BFMI333 rt3, ra3, 0)   |                           |                      |
| ZEH33 rt3, ra3          | Zero Extend Halfword      | ZEH rt5, ra5         |
| (BFMI333 rt3, ra3, 1)   |                           |                      |
| SEB33 rt3, ra3          | Sign Extend Byte          | SEB rt5, ra5         |
| (BFMI333 rt3, ra3, 2)   |                           |                      |
| SEH33 rt3, ra3          | Sign Extend Halfword      | SEH rt5, ra5         |
| (BFMI333 rt3, ra3, 3)   |                           |                      |
|                         | Extension Set (Non-Basel  | ine)                 |
| XLSB33 rt3, ra3         | Extract LSB               | ANDI rt5, ra5, 1     |
| (BFMI333 rt3, ra3, 4)   |                           |                      |
| X11B33 rt3, ra3         | Extract the least 11 Bits | ANDI rt5, ra5, 0x7ff |
| (BFMI333 rt3, ra3, 5)   |                           |                      |

#### Table 38Load / Store Instruction (16-Bit)

| Ν         | Inemonic     | Instruction               |        | 32-Bit Operation      |
|-----------|--------------|---------------------------|--------|-----------------------|
| LWI450    | rt4, [ra5]   | Load Word Immediate       | LWI    | rt5, [ra5+0]          |
| LWI333    | rt3, [ra_3+  | Load Word Immediate       | LWI    | rt5, [ra5+ZE(imm3u)]  |
|           | imm3u]       |                           |        |                       |
| LWI333.bi | rt3, [ra_3], | Load Word Immediate with  | LWI.bi | rt5, [ra5], ZE(imm3u) |
|           | imm3u        | Post-increment            |        |                       |
| LHI333    | rt3, [ra_3+  | Load Halfword Immediate   | LHI    | rt5, [ra5+ZE(imm3u)]  |
|           | imm3u]       |                           |        |                       |
| LBI333    | rt3, [ra_3+  | Load Byte Immediate       | LBI    | rt5, [ra5+ZE(imm3u)]  |
|           | imm3u]       |                           |        |                       |
| SWI450    | rt4, [ra5]   | Store Word Immediate      | SWI    | rt5, [ra5+0]          |
| SWI333    | rt3, [ra_3+  | Store Word Immediate      | SWI    | rt5, [ra5+ZE(imm3u)]  |
|           | imm3u]       |                           |        |                       |
| SWI333.bi | rt3, [ra_3], | Store Word Immediate with | SWI.bi | rt5, [ra5], ZE(imm3u) |
|           | imm3u        | Post-increment            |        |                       |
| SHI333    | rt3, [ra_3+  | Store Halfword Immediate  | SHI    | rt5, [ra5+ZE(imm3u)]  |
|           | imm3u]       |                           |        |                       |
| SBI333    | rt3, [ra_3+  | Store Byte Immediate      | SBI    | rt5, [ra5+ZE(imm3u)]  |



imm3u]

Table 39 Load/Store Instruction with Implied FP (16-Bit)

|       | Mnemonic        | Instruction                |     | 32-Bit Operation    |
|-------|-----------------|----------------------------|-----|---------------------|
| LWI37 | rt3, [fp+imm7u] | Load Word with Implied FP  | LWI | rt5, [fp+ZE(imm7u)] |
| SWI37 | rt3, [fp+imm7u] | Store Word with Implied FP | SWI | rt5, [fp+ZE(imm7u)] |

| Ν      | Inemonic   | Instruction              |               | 32-Bit Operation     |
|--------|------------|--------------------------|---------------|----------------------|
|        |            |                          |               |                      |
| BEQS38 | rt3, imm8s | Branch on Equal (implied | BEQ           | rt5, r5, SE(imm8s)   |
|        |            | r5)                      | (next sequent | tial $PC = PC + 2$ ) |
| BNES38 | rt3, imm8s | Branch on Not Equal      | BNE           | rt5, r5, SE(imm8s)   |
|        |            | (implied r5)             | (next sequent | tial $PC = PC + 2$ ) |
| BEQZ38 | rt3, imm8s | Branch on Equal Zero     | BEQZ          | rt5, SE(imm8s)       |
|        |            |                          | (next sequent | tial $PC = PC + 2$ ) |
| BNEZ38 | rt3, imm8s | Branch on Not Equal Zero | BNEZ          | rt5, SE(imm8s)       |
|        |            |                          | (next sequent | tial $PC = PC + 2$ ) |
| J8     | imm8s      | Jump Immediate           | J             | SE(imm8s)            |
| JR5    | rb5        | Jump Register            | JR            | rb5                  |
| RET5   | rb5        | Return from Register     | RET           | rb5                  |
| JRAL5  | rb5        | Jump Register and Link   | JRAL          | rb5                  |

#### Table 40Branch and Jump Instruction (16-Bit)

Table 41Compare and Branch Instruction (16-Bit)

| М       | nemonic    | Instruction               |             | 32-Bit Operation      |
|---------|------------|---------------------------|-------------|-----------------------|
| SLTI45  | ra4, imm5u | Set on Less Than Unsigned | SLTI        | r15, ra5, ZE(imm5u)   |
|         |            | Immediate                 |             |                       |
| SLTSI45 | ra4, imm5u | Set on Less Than Signed   | SLTSI       | r15, ra5, ZE(imm5u)   |
|         |            | Immediate                 |             |                       |
| SLT45   | ra4, rb5   | Set on Less Than Unsigned | SLT         | r15, ra5, rb5         |
| SLTS45  | ra4, rb5   | Set on Less Than Signed   | SLTS        | r15, ra5, rb5         |
| BEQZS8  | imm8s      | Branch on Equal Zero      | BEQZ        | r15, SE(imm8s)        |
|         |            | (implied r15)             | (next seque | ntial $PC = PC + 2$ ) |
| BNEZS8  | imm8s      | Branch on Not Equal Zero  | BNEZ        | r15, SE(imm8s)        |
|         |            | (implied r15)             | (next seque | ntial $PC = PC + 2$ ) |



| Mnemonic               | Instruction  | 32-Bit Operation |
|------------------------|--------------|------------------|
| BREAK16                | Breakpoint   | BREAK            |
| NOP16                  | No Operation | NOP              |
| (alias of SRLI45 R0,0) |              |                  |

Table 42Misc. Instruction (16-Bit)



# Chapter 4 16/32-Bit Baseline Version 2 Instruction

Several new 16/32-bit instructions are added to the Baseline instruction set. This new set of Baseline instructions is defined as Baseline version 2 instruction set to distinguish with the original Baseline instruction set.

These new instructions are summarized in the following sections.

- 4.1 16-bit Baseline V2 instruction on page 28
- 4.2 32-bit Baseline V2 instruction on page 29



## 4.1 16-bit Baseline V2 instructions

| Mnemonic         | Instruction           |      | 32-Bit Operation   |
|------------------|-----------------------|------|--------------------|
| ADDI10.sp imm10s | Add Immediate with    | ADDI | sp, sp, SE(imm10s) |
|                  | implied stack pointer |      |                    |

#### Table 43ALU Instructions

| Mnemonic             |        | Instruction               | 32-Bit Operation                    |
|----------------------|--------|---------------------------|-------------------------------------|
| LWI37.sp rt3, [+ (im | m7u << | Load word Immediate with  | LWI 3T5(Rt3), [SP + ZE(imm7u << 2)] |
| 2)]                  |        | implied SP                |                                     |
| SWI37.sp rt3, [+ (im | m7u << | Store word Immediate with | SWI 3T5(Rt3), [SP + ZE(imm7u << 2)] |
| 2)]                  |        | implied SP                |                                     |

Mnemonic

ADDI.gp rt5, imm19s



Operation

rt5 = gp + SE(imm19s)

res = ra5 \* rb5;

rt5 = rt5 - res(31,0);

rt5 = Floor(ra5 (unsigned) / rb5);

rs5 = ra5 (unsigned) mod rb5;

rt5 = Floor(ra5 (signed) / rb5);

rs5 = ra5 (signed) mod rb5;

## 4.2 **32-bit Baseline V2 instructions**

Instruction

GP-implied Add Immediate

| Table 46    Multiply and Divide Instructions |                            |                            |  |
|----------------------------------------------|----------------------------|----------------------------|--|
| Mnemonic Instruction                         |                            | Operation                  |  |
| MULR64 rt5, ra5, rb5                         | Multiply unsigned word to  | res = ra5 (unsigned)* rb5; |  |
|                                              | registers                  | if (PSW.BE == 1) {         |  |
|                                              |                            | (rt5_even, rt5_odd) = res; |  |
|                                              |                            | } else {                   |  |
|                                              |                            | (rt5_odd, rt5_even) = res; |  |
|                                              |                            | }                          |  |
| MULSR64 rt5, ra5, rb5                        | Multiply signed word to    | res = ra5 (signed)* rb5;   |  |
|                                              | registers                  | if (PSW.BE == 1) {         |  |
|                                              |                            | (rt5_even, rt5_odd) = res; |  |
|                                              |                            | } else {                   |  |
|                                              |                            | (rt5_odd, rt5_even) = res; |  |
|                                              |                            | }                          |  |
| MADDR32 rt5, ra5, rb5                        | Multiply and add to 32-bit | res = ra5 + rb5;           |  |
|                                              | register                   | rt5 = rt5 + res(31.0):     |  |

Multiply and subtract from

Unsigned integer divide to

Signed integer divide to

32-bit register

registers

registers

### Table 45 ALU Instructions

MSUBR32 rt5, ra5, rb5

DIVR rt5, rs5, ra5, rb5

DIVSR rt5, rs5, ra5, rb5



| Mnemonic                                 | Instruction               | 32-Bit Operation                          |
|------------------------------------------|---------------------------|-------------------------------------------|
| LBI.gp rt5, [+ imm19s]                   | GP-implied Load unsigned  | address = gp + SE(imm19s)                 |
|                                          | Byte Immediate            | rt5 = ZE(Byte-memory(address))            |
| LBSI.gp rt5, [+ imm19s]                  | GP-implied Load signed    | address = gp + SE(imm19s)                 |
|                                          | Byte Immediate            | rt5 = SE(Byte-memory(address))            |
| LHI.gp rt5, [+ (imm18s <<                | GP-implied Load unsigned  | address = gp + SE(imm18s << 1)            |
| 1)]                                      | Halfword Immediate        | rt5 = ZE(Halfword-memory(address))        |
| LHSI.gp rt5, [+ (imm18s                  | GP-implied Load signed    | address = gp + SE(imm18s << 1)            |
| << 1)]                                   | Halfword Immediate        | rt5 = SE(Halfword-memory(address))        |
| LWI.gp rt5, [+ (imm17s <<                | GP-implied Load Word      | address = $gp + SE(imm17s \ll 2)$         |
| 2)]                                      | Immediate                 | rt5 = Word-memory(address)                |
| SBI.gp rt5, [+ imm19s]                   | GP-implied Store Byte     | address = gp + SE(imm19s)                 |
|                                          | Immediate                 | Byte-memory(address) = rt5[7:0]           |
| SHI.gp rt5, [+ (imm18s <<                | GP-implied Store Halfword | address = gp + SE(imm18s << 1)            |
| 1)]                                      | Immediate                 | Halfword-memory(address) = $rt5[15:0]$    |
| SWI.gp rt5, [+ (imm17s <<                | GP-implied Store Word     | address = gp + SE(imm17s << 2)            |
| 2)]                                      | Immediate                 | Word-memory(address) = rt5                |
| LMWA. $\{b   a\} \{i   d\} \{m?\} rb5$ , | Load multiple word with   | Please see page 304 for details.          |
| [ra5], re5, Enable4                      | alignment check           |                                           |
| SMWA. $\{b   a\} \{i   d\} \{m?\} rb5$ , | Store multiple word with  | Please see page 318 for details.          |
| [ra5], re5, Enable4                      | alignment check           |                                           |
| LBUP Rt, [Ra + (Rb << sv)]               | Load Byte with User       | Equivalent to LB instruction but with the |
|                                          | Privilege                 | user mode privilege address translation.  |
|                                          |                           | See page 301 for details.                 |
| SBUP Rt, $[Ra + (Rb \ll sv)]$            | Store Byte with User      | Equivalent to SB instruction but with the |
|                                          | Privilege                 | user mode privilege address translation.  |
|                                          |                           | See page 316 for details.                 |

Table 47 Load/Store Instructions



# Chapter 5 32-bit ISA Extensions

This chapter describes instructions of various 32-bit ISA extensions and contains the following sections

- 5.1 Performance Extension V1 instructions on page 32
- 5.2 Performance Extension V2 instructions on page 33
- 5.3 String Extension instructions on page 34



# **5.1** Performance Extension V1 Instructions

The performance extension version 1 instructions are used to optimize high level language (C / C++) performance. The instructions in this extension are summarized in the following sections.

|       | Mnemonic        | Instruction               | Operation                                                |
|-------|-----------------|---------------------------|----------------------------------------------------------|
| ABS   | rt5, ra5        | Absolute with Register    | rt5 =   ra5                                              |
| AVE   | rt5, ra5, rb5   | Average two signed        | rt5 = (ra5 + rb5 + 1) (arith) >> 1 (page                 |
|       |                 | integers with rounding    | 213)                                                     |
| MAX   | rt5, ra5, rb5   | Return the Larger         | rt5 = signed-max (ra5, rb5)                              |
| MIN   | rt5, ra5, rb5   | Return the Smaller        | rt5 = signed-min (ra5, rb5)                              |
| BSET  | rt5, ra5, imm_5 | Bit Set                   | $rt5 = ra5 \parallel (1 << imm_5)$                       |
| BCLR  | rt5, ra5, imm_5 | Bit Clear                 | rt5 = (ra5 && ~(1 << imm_5))                             |
| BTGL  | rt5, ra5, imm_5 | Bit Toggle                | $rt5 = ra5 \land (1 << imm_5)$                           |
| BTST  | rt5, ra5, imm_5 | Bit Test                  | $rt5 = (ra5 \&\& (1 \le imm_5))? 1 : 0$                  |
| CLIPS | rt5, ra5, imm_5 | Clip Value Signed         | $rt5 = (ra5 > 2^{imm5}-1)? 2^{imm5}-1 : ((ra5 < $        |
|       |                 |                           | $-2^{imm5}$ )? $-2^{imm5}$ : ra5)                        |
| CLIP  | rt5, ra5, imm_5 | Clip Value                | $rt5 = (ra5 > 2^{imm5} - 1)? 2^{imm5} - 1 : ((ra5 < 0)?$ |
|       |                 |                           | 0 : ra5)                                                 |
| CLZ   | rt5, ra5        | Counting Leading Zeros in | rt5 = COUNT_ZERO_FROM_MSB(ra5)                           |
|       |                 | Word                      |                                                          |
| CLO   | rt5, ra5        | Counting Leading Ones in  | rt5 = COUNT_ONE_FROM_MSB(ra5)                            |
|       |                 | Word                      |                                                          |

 Table 48
 ALU Instruction (Extension)



# **5.2 Performance Extension V2 Instructions**

The performance extension version 2 instructions are used to optimize multimedia encoding/decoding processing related applications. The instructions in this extension are summarized in the following sections.

|        | Mnemonic        | Instruction          | Operation                         |
|--------|-----------------|----------------------|-----------------------------------|
| BSE    | rt5, ra5, rb5   | Bitstream Extraction | rt5 = Bitstream_Extract(ra5, rb5) |
|        |                 |                      | Please see page 225 for details.  |
| BSP    | rt5, ra5, rb5   | Bitstream Packing    | rt5 = Bitstream_Packing(ra5, rb5) |
|        |                 |                      | Please see page 231 for details.  |
| PBSAD  | rt5, ra5, rb5   | Parallel Byte Sum of | a = ABS(ra5(7,0) - rb5(7,0));     |
|        |                 | Absolute Difference  | b = ABS(ra5(15,8) - rb5(15,8));   |
|        |                 |                      | c = ABS(ra5(23,16) - rb5(23,16)); |
|        |                 |                      | d = ABS(ra5(31,24) - rb5(31,24)); |
|        |                 |                      | rt5 = a + b + c + d;              |
| PBSADA | A rt5, ra5, rb5 | Parallel Byte Sum of | a = ABS(ra5(7,0) - rb5(7,0));     |
|        |                 | Absolute Difference  | b = ABS(ra5(15,8) - rb5(15,8));   |
|        |                 | Accumulate           | c = ABS(ra5(23,16) - rb5(23,16)); |
|        |                 |                      | d = ABS(ra5(31,24) - rb5(31,24)); |
|        |                 |                      | rt5 = rt5 + a + b + c + d;        |

| Table 49 | Performance  | Extension | V2 I1  | nstructions   |
|----------|--------------|-----------|--------|---------------|
|          | 1 childhilde | LAtension | V 4 11 | isti uctions. |



## 5.3 32-bit String Extension

The String Extension instructions are use to optimize text and string processing related algorithms. These instructions will be released in the future.



# Chapter 6 Coprocessor Instructions

The Coprocessor ISA extension will be released in the future.

#### **Coprocessor Instructions**



Coprocessors may perform the following instructions:

- 1. Move General Purpose to Coprocessor Register
- 2. Move Coprocessor to General Purpose Register
- 3. Load Coprocessor Register
- 4. Store Coprocessor Register
- 5. Coprocessor Data Operation

Current version does not support Coprocessor instructions



# **Chapter 7 Detail Instruction Description**

This chapter describes the detail description of each AndeStar instruction and contains the following sections

- 7.1 32-bit Baseline instructions on page 38
- 7.2 32-bit Performance extension instructions on page 211
- 7.3 32-bit String extension instructions on page 239
- 7.4 16-bit Baseline instructions on page 240



# 7.1 **32-bit Baseline instructions**



## ADD (Addition)

Type: 32-Bit Baseline Format:

| 31 | 30    | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|-------|-----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU_1 |     | F  | Rt | R  | la | R  | b  | 000 | 000 | AI  | DD  |
|    | 100   | 000 |    |    |    |    |    |    |     |     | 000 | 000 |

Syntax: ADD Rt, Ra, Rb

**Purpose:** Add the content of two registers.

**Description:** The content of Ra is added with the content of Rb. And the result is written to Rt.

**Operations:** 

Rt = Ra + Rb;

Exceptions: None

Privilege level: All

Note:



## ADDI (Add Immediate)

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 0   |
|----|--------|-------|-------|--------|
| 0  | ADDI   | Rt    | Ra    | imm15s |
|    | 101000 |       |       |        |

Syntax: ADDI Rt, Ra, imm15s

**Purpose:** Add the content of a register with a signed constant.

**Description:** The content of Ra is added with the sign-extended imm15s. And the result is written to Rt.

#### **Operations:**

Rt = Ra + SE(imm15s);

Exceptions: None Privilege level: All

Note:



## AND (Bit-wise Logical And)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30    | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|-------|-----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU_1 |     | F  | Rt | R  | la | R  | b  | 000 | 000 | Al  | ٧D  |
|    | 100   | 000 |    |    |    |    |    |    |     |     | 000 | 010 |

Syntax: AND Rt, Ra, Rb

**Purpose:** Doing a bit-wise logical AND operation on the content of two registers. **Description:** The content of Ra is combined with the content of Rb using a bit-wise logical AND operation. And the result is written to Rt.

#### **Operations:**

Rt = Ra & Rb;



## ANDI (And Immediate)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 0   |
|----|--------|-------|-------|--------|
| 0  | ANDI   | Rt    | Ra    | imm15u |
|    | 101010 |       |       |        |

Syntax: ANDI Rt, Ra, imm15u

**Purpose:** Bit-wise AND of the content of a register with an unsigned constant.

**Description:** The content of Ra is bit-wise ANDed with the zero-extended imm15u. And the result is written to Rt.

#### **Operations:**

Rt = Ra & ZE(imm15u);



## **BEQ (Branch on Equal)**

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 15 | 14  | 13     | 0 |
|----|--------|-------|-------|-----|--------|---|
| 0  | BR1    | Rt    | Ra    | BEQ | imm14s |   |
|    | 100110 |       |       | 0   |        |   |

Syntax: BEQ Rt, Ra, imm14s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the contents of two registers.

**Description:** If the content of Rt is equal to the content of Ra, then branch to the target address of adding the current instruction address with the sign-extended (imm14s  $\ll$  1) value. The branch range is  $\pm$  16K bytes.

#### **Operations:**

```
if (Rt == Ra) {
    PC = PC + SE(imml4s << 1);
}</pre>
```



## **BEQZ (Branch on Equal Zero)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25 | 24     | 20 | 19 | 16 | 15 | 0     |
|----|--------|----|--------|----|----|----|----|-------|
| 0  | BR2    |    | BR2 Rt |    | BE | QZ | i  | mm16s |
|    | 100111 |    |        |    | 00 | 10 |    |       |

#### Syntax: BEQZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt is equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm $16s \ll 1$ ) value. The branch range is  $\pm 64$ K bytes.

#### **Operations:**

```
if (Rt == 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



## **BGEZ (Branch on Greater than or Equal to Zero)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 16  | 15 | 0    |
|----|--------|----|----|----|----|-----|----|------|
| 0  | BR2    |    | R  | Rt | BG | θEZ | im | m16s |
|    | 100111 |    |    |    | 01 | 00  |    |      |

Syntax: BGEZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is greater than or equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm16s  $\ll$  1) value. The branch range is  $\pm$  64K bytes.

#### **Operations:**

```
if (Rt >= 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



## **BGEZAL (Branch on Greater than or Equal to Zero and Link)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19  | 16   | 15 | 0    |
|----|--------|----|----|----|-----|------|----|------|
| 0  | BR2    |    | Rt |    | BGE | EZAL | im | m16s |
|    | 100111 |    |    |    | 11  | 00   |    |      |

Syntax: BGEZAL Rt, imm16s

**Purpose:** It is used for conditional PC-relative function call branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is greater than or equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm16s << 1) value. The branch range is  $\pm$  64K bytes. The program address of the next sequential instruction (PC+4) is written to R30 (Link Pointer register) unconditionally for function call return purpose.

#### **Operations:**

```
R30 = PC + 4;
if (Rt >= 0) {
    PC = PC + SE(imm16s << 1);
}
```



## **BGTZ (Branch on Greater than Zero)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 16 | 15 | 0     |
|----|--------|----|----|----|----|----|----|-------|
| 0  | BR2    |    | R  | Rt | BG | ΤΖ | ir | nm16s |
|    | 100111 |    |    |    | 01 | 10 |    |       |

Syntax: BGTZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is greater than zero, then branch to the target address of adding the current instruction address with the sign-extended (imm $16s \ll 1$ ) value. The branch range is  $\pm 64$ K bytes.

#### **Operations:**

```
if (Rt > 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



## BLEZ (Branch on Less than or Equal to Zero)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 16  | 15 | 0    |
|----|--------|----|----|----|----|-----|----|------|
| 0  | BR2    |    | R  | Rt | BL | ĿΕΖ | im | m16s |
|    | 100111 |    |    |    | 01 | 11  |    |      |

#### Syntax: BLEZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is less than or equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm $16s \ll 1$ ) value. The branch range is  $\pm 64$ K bytes.

#### **Operations:**

```
if (Rt <= 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



### **BLTZ (Branch on Less than Zero)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 16 | 15 | 0     |
|----|--------|----|----|----|----|----|----|-------|
| 0  | BR2    |    | F  | Rt | BL | ΤZ | in | nm16s |
|    | 100111 |    |    |    | 01 | 01 |    |       |

#### Syntax: BLTZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is less than zero, then branch to the target address of adding the current instruction address with the sign-extended (imm $16s \ll 1$ ) value. The branch range is  $\pm 64$ K bytes.

#### **Operations:**

```
if (Rt < 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



## **BLTZAL (Branch on Less than Zero and Link)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19     | 16 | 15     | 0 |
|----|--------|----|----|----|--------|----|--------|---|
| 0  | BR2    |    | Rt |    | BLTZAL |    | imm16s |   |
|    | 100111 |    |    |    | 11     | 01 |        |   |

Syntax: BLTZAL Rt, imm16s

**Purpose:** It is used for conditional PC-relative function call branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt, treated as a signed integer, is less than zero, then branch to the target address of adding the current instruction address with the sign-extended (imm16s << 1) value. The branch range is  $\pm$  64K bytes. The program address of the next sequential instruction (PC+4) is written to R30 (Link Pointer register) unconditionally for function call return purpose.

#### **Operations:**

```
R30 = PC + 4;
if (Rt < 0) {
    PC = PC + SE(imml6s << 1);
}
```



## **BNE (Branch on Not Equal)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 2   | 5 | 24 | 20 | 19 | 15 | 14  | 13 | 0     |
|----|--------|---|----|----|----|----|-----|----|-------|
| 0  | BR1    |   | R  | lt | R  | a  | BNE | iı | mm14s |
|    | 100110 |   |    |    |    |    | 1   |    |       |

Syntax: BNE Rt, Ra, imm14s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the contents of two registers.

**Description:** If the content of Rt is not equal to the content of Ra, then branch to the target address of adding the current instruction address with the sign-extended (imm14s  $\ll$  1) value. The branch range is  $\pm$  16K bytes.

#### **Operations:**

```
if (Rt != Ra) {
    PC = PC + SE(imml4s << 1);
}</pre>
```



## **BNEZ (Branch on Not Equal Zero)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25 | 24     | 20 | 19   | 16 | 15     | 0 |
|----|--------|----|--------|----|------|----|--------|---|
| 0  | BR2    |    | BR2 Rt |    | BNEZ |    | imm16s |   |
|    | 100111 |    |        |    | 00   | 11 |        |   |

#### Syntax: BNEZ Rt, imm16s

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt is not equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm $16s \ll 1$ ) value. The branch range is  $\pm 64$ K bytes.

#### **Operations:**

```
if (Rt != 0) {
    PC = PC + SE(imml6s << 1);
}</pre>
```



## **BREAK (Breakpoint)**

#### Format:

| 31 | 30 25  | 24 20 | 19   | 5 | 4    | 0  |
|----|--------|-------|------|---|------|----|
| 0  | MISC   | 00000 | SWID |   | BREA | AK |
|    | 110010 |       |      |   | 010  | 10 |

#### Syntax: BREAK SWID

Purpose: It is used to generate a Breakpoint exception.

#### **Description:**

BREAK instruction will unconditionally generate a Breakpoint exception and transfer control to the Breakpoint exception handler. The 15-bits SWID is used by software as a parameter to distinguish different breakpoint features and usages.

#### **Operations:**

Generate\_Exception(Breakpoint);

#### **Exceptions:** Breakpoint

Privilege level: All Note:



## CCTL (Cache Control)

#### Format:

| 31 | 30  | 25  | 24  | 20  | 19 | 15 | 14 | 11 | 10    | 9   | 5    | 4   | 0   |
|----|-----|-----|-----|-----|----|----|----|----|-------|-----|------|-----|-----|
| 0  | MI  | SC  | Rt/ | ′Rb | R  | a  | 00 | 00 | level | Sub | Туре | CC  | CTL |
|    | 110 | 010 |     |     |    |    |    |    |       |     |      | 000 | 001 |

Syntax: CCTL Ra, SubType

CCTL Ra, SubType, level CCTL Rt, Ra, SubType CCTL Rb, Ra, SubType CCTL L1D\_INVALALL (VA writeback or invalidate operation)(Cache read operation)(Cache write operation)(Cache invalidate all operation)

**Purpose:** Perform various operations on processor caches. This instruction is typically used by software to maintain cache coherence for shared memory.

#### **Description:**

This instruction is used to perform cache control operations based on the SubType field. The definition and encoding for the SubType field are listed in the following tables. Some CCTL operations can be used with user privilege to assist in coherence and synchronization management. Some CCTL cache read operations has an additional destination register Rt. And some CCTL write operation has an additional source register Rb. Although defined, certain SubType encodings of the cache control instruction are optional to implement. And execution of an unimplemented optional cache control operation will cause a Reserved Instruction exception. Similarly, execution of an undefined SubType encoding of this instruction will cause a Reserved Instruction exception as well. For the "level" field, please see "multi-level cache management operation" in page 60.

| Su | ıbType | bit 4-3      |              |              |              |  |  |  |  |  |
|----|--------|--------------|--------------|--------------|--------------|--|--|--|--|--|
|    |        | 0            | 1            | 2            | 3            |  |  |  |  |  |
| b  | it 2-0 | 00           | 01           | 10           | 11           |  |  |  |  |  |
|    |        | L1D_IX       | L1D_VA       | L1I_IX       | L1I_VA       |  |  |  |  |  |
| 0  | 000    | L1D_IX_INVAL | L1D_VA_INVAL | L1I_IX_INVAL | L1I_VA_INVAL |  |  |  |  |  |

#### Table 50CCTL SubType Encoding

#### **Detail Instruction Description**



| 1 | 001 | L1D_IX_WB      | L1D_VA_WB      | -           | -             |
|---|-----|----------------|----------------|-------------|---------------|
| 2 | 010 | L1D_IX_WBINVAL | L1D_VA_WBINVAL | -           | -             |
| 3 | 011 | L1D_IX_RTAG    | L1D_VA_FILLCK  | L1I_IX_RTAG | L1I_VA_FILLCK |
| 4 | 100 | L1D_IX_RWD     | L1D_VA_ULCK    | L1I_IX_RWD  | L1I_VA_ULCK   |
| 5 | 101 | L1D_IX_WTAG    | -              | L1I_IX_WTAG | -             |
| 6 | 110 | L1D_IX_WWD     | -              | L1I_IX_WWD  | -             |
| 7 | 111 | L1D_INVALALL   | -              | -           | _             |

#### Table 51CCTL SubType Definitions

| Mnemonics      | Operation     | Ra Type | Rt?/Rb? | User      | Compliance |
|----------------|---------------|---------|---------|-----------|------------|
|                | (Category)    |         |         | Privilege |            |
| L1D_IX_INVAL   | Invalidate    | Index   | _/_     | -         | Required   |
|                | L1D cache     |         |         |           |            |
|                | (A)           |         |         |           |            |
| L1D_VA_INVAL   | Invalidate    | VA      | _/_     | Yes       | Required   |
|                | L1D cache     |         |         |           |            |
|                | (A)           |         |         |           |            |
| L1D_IX_WB      | Write Back    | Index   | _/_     | -         | Required   |
|                | L1D cache     |         |         |           |            |
|                | (B)           |         |         |           |            |
| L1D_VA_WB      | Write Back    | VA      | -/-     | Yes       | Required   |
|                | L1D cache     |         |         |           |            |
|                | (B)           |         |         |           |            |
| L1D_IX_WBINVAL | Write Back &  | Index   | _/_     | -         | Optional   |
|                | Invalidate    |         |         |           |            |
|                | L1D cache     |         |         |           |            |
|                | (C)           |         |         |           |            |
| L1D_VA_WBINVAL | Write Back &  | VA      | _/_     | Yes       | Optional   |
|                | Invalidate    |         |         |           |            |
|                | L1D cache     |         |         |           |            |
|                | (C)           |         |         |           |            |
| L1D_VA_FILLCK  | Fill and Lock | VA      | _/_     | -         | Optional   |
|                | L1D cache     |         |         |           |            |
|                | (D)           |         |         |           |            |
| L1D_VA_ULCK    | unlock L1D    | VA      | _/_     | -         | Optional   |



|               | cache          |         |       |     |          |
|---------------|----------------|---------|-------|-----|----------|
|               | (E)            |         |       |     |          |
| L1D_IX_RTAG   | Read tag L1D   | Index   | Yes/- | -   | Optional |
|               | cache          |         |       |     |          |
|               | (F)            |         |       |     |          |
| L1D_IX_RWD    | Read word      | Index/w | Yes/- |     | Optional |
|               | data L1D       |         |       |     |          |
|               | cache          |         |       |     |          |
|               | (G)            |         |       |     |          |
| L1D_IX_WTAG   | Write tag      | Index   | -/Yes | -   | Optional |
|               | L1D cache*     |         |       |     |          |
|               | (H)            |         |       |     |          |
| L1D_IX_WWD    | Write word     | Index/w | -/Yes | -   | Optional |
|               | data L1D       |         |       |     |          |
|               | cache*         |         |       |     |          |
|               | (I)            |         |       |     |          |
| L1D_INVALALL  | Invalidate All | N/A     | _/_   | -   | Optional |
|               | L1D cache      |         |       |     |          |
|               | (J)            |         |       |     |          |
| L1I_VA_FILLCK | Fill and Lock  | VA      | -/-   | -   | Optional |
|               | L1I cache      |         |       |     |          |
|               | (D)            |         |       |     |          |
| L1I_VA_ULCK   | unlock L1I     | VA      | -/-   | -   | Optional |
|               | cache          |         |       |     |          |
|               | (E)            |         |       |     |          |
| L1I_IX_INVAL  | Invalidate L1I | Index   | _/_   | -   | Required |
|               | cache          |         |       |     |          |
|               | (A)            |         |       |     |          |
| L1I_VA_INVAL  | Invalidate L1I | VA      | _/_   | Yes | Required |
|               | cache          |         |       |     |          |
|               | (A)            |         |       |     |          |
| L1I_IX_RTAG   | Read tag L1I   | Index   | Yes/- | -   | Optional |
|               | cache          |         |       |     |          |
|               | (F)            |         |       |     |          |
| L1I_IX_RWD    | Read word      | Index/w | Yes/- |     | Optional |



|             | data L1I      |         |       |   |          |
|-------------|---------------|---------|-------|---|----------|
|             | cache         |         |       |   |          |
|             | (G)           |         |       |   |          |
| L1I_IX_WTAG | Write tag L1I | Index   | -/Yes | - | Optional |
|             | cache         |         |       |   |          |
|             | (H)           |         |       |   |          |
| L1I_IX_WWD  | Write word    | Index/w | -/Yes | - | Optional |
|             | data L1I      |         |       |   |          |
|             | cache         |         |       |   |          |
|             | (I)           |         |       |   |          |

\*Note: The "Write word data L1D cache" operation may be omitted by an implementation since its effect can be achieved using store instructions after the CCTL "Write tag L1D cache" operation and a "Data Serialization Barrier" instruction.

The content of the register Ra is used to find the correct cache location for the cache operation. It is used in the following two ways:

| Ra Type | Usage                                                                       |
|---------|-----------------------------------------------------------------------------|
| Index   | The content of Ra is used directly as a (Index, Way) pair to access the     |
|         | cache location without going through any translation mechanism. The         |
|         | real format for the (Index, Way) pair in the content of Ra is per cache     |
|         | implementation-dependent. However, it should have a general form            |
|         | shown in Figure 1. Software could discover the real format by consulting    |
|         | the ICM CFG (cr1) and DCM_CFG (cr2) Configuration Registers in the          |
|         | Andes processor core.                                                       |
| Index/w | The content of Ra is used directly as a (Index, Way, Word) triple to access |
|         | the cache location without going through any translation. The "Word"        |
|         | means a 4-bytes word in the cache line pointed to by the (Index, Way)       |
|         | pair. The remaining description is similar to the "Index" type above.       |
| VA      | The content of Ra is used as a virtual address to access the cache. The     |
|         | address goes through the same address translation mechanism in the          |
|         | processor pipeline as the address of a load/store instruction for D cache   |
|         | or an instruction fetch for I cache. And the specified operation is only    |
|         | performed if the address hits in the corresponding cache. If the cache is   |



missed, no specific operation is performed.

#### Figure 1. Index type format for Ra of CCTL instruction



The operations of the cache control instruction can be grouped and described in the following categories:

A. Invalidate cache block

This operation unlocks the cache line and sets the state of the cache line to invalid. It is implementation-dependent on how this instruction affects other states of the cache line such as "way selection".

- B. Write back cache block
   If the cache line state is valid and dirty in a write-back cache, this operation writes the cache line back to memory. It does not affect the lock state of the cache line.
- C. Write back & invalidate cache block

If the cache line state is valid and dirty in a write-back cache, this operation writes the cache line back to memory, and then performs a cache block invalidating operation described in (A).

D. Fill and Lock cache block

If the desired cache line is not present in the cache, this operation fills the cache line into the cache and then sets the lock state of the cache line. If the desired cache line is present in the cache, only the lock state will be set. A locked cache line will not be replaced when a cache miss/fill event happens. A locked cache line can only be unlocked using a cache invalidate or unlock operation. Since this cache line lock operation is defined under the implementation assumption of a multi-way set-associative cache, the support of this operation is implementation dependent. And if this instruction is supported for a multi-way set-associative cache, only up to



"way-1" cache lines in a cache set can be locked using this instruction. If software wants to get a predictable behavior, care must be taken to avoid locking more than "way-1" cache lines in a cache set. If "way" cache lines in a cache set are all locked by software, and if ICALCK/DCALCK of Cache Control system register is 0, then it is IMPLEMENTATION-DEPENDENT on when an exception will be generated.

E. Unlock cache block

This operation clears the lock state of the cache if the desired cache line is present in the cache.

F. Read tag from cache

This operation reads the contents of a cache line tag into a general register Rt. The tag format is implementation-dependent. A reference format is illustrated as follows.

| 31   | 23  | 22    | 21        | 2 | 1     | 0    |
|------|-----|-------|-----------|---|-------|------|
| igno | red | dirty | PA(31,12) | ) | valid | lock |

The content of Ra specifies the index and way of the target cache line.

G. Read data word from cache

This operation reads a 4-bytes word from a cache line into a general register Rt. The endian format of this operation should depend on the value of PSW.BE. The content of Ra specifies the index, way, and word of the target cache line.

H. Write tag to cache

This operation writes the cache line tag from a general register Rb. The tag format is implementation-dependent. A reference format is illustrated as follows.

| 31 |         | 23 | 22    | 21 |           | 2 | 1     | 0    | _ |
|----|---------|----|-------|----|-----------|---|-------|------|---|
|    | ignored |    | dirty |    | PA(31,12) |   | valid | lock |   |

The content of Ra specifies the index and way of the target cache line.

I. Write word data to cache

This operation writes a 4-byte word in a general register Rb into a target cache line. The endian format of this operation should depend on the value of PSW.BE. The content of Ra specifies the index, way, and word of the target cache line.

J. Invalidate All cache block

This operation unlocks all of the cache lines and sets the state of all of the cache lines to invalid. It is implementation-dependent on how this instruction affects other states of the cache line such as "way selection".


#### • Multi-level cache management operation

For a system with multiple levels of caches, it would be more convenient for software to manage a cache block if the software can control whether a CCTL write-back or invalidate operation is applied to just the first level cache or to other higher levels of caches as well. This has the benefit that software can use just one CCTL instruction to affect all levels of caches without using separate instructions to manage each level of cache individually.

To enable this, all of the CCTL instructions with VA writeback or VA invalidate related operations have two flavors to indicate if the operation is applied to all levels of caches or only one level. The two flavors are as follows:

All level operation: "CCTL Ra, SubType, alevel" One level operation: "CCTL Ra, SubType, 1level"

The following SubTypes have these two flavors.

| L1D_VA_INVAL   |
|----------------|
| L1D_VA_WB      |
| L1D_VA_WBINVAL |
| L1I_VA_INVAL   |

For all other CCTL SubTypes, the "level" encoding will be a "Don't care" field.

For software to make sure the completeness of the "all level" operations, a "MSYNC all" instruction needs to be used after the "all level" operations to guarantee that any instructions after the "MSYNC all" will see the effects completed by the "all level" operations.

For a system with one level of caches, the "all level" flavor behaves the same as the "one level" operation.

### **Programming Constraints:**

1. CCTL Ra, L1D\_VA\_WB, alevel:



This instruction guarantees to write back a dirty cache line of L1 cache to memory.

2. CCTL Ra, L1D\_VA\_INVAL, alevel:

This instruction will invalidate cache lines of different size in different levels of a cache hierarchy. So to use this instruction, a programmer needs to first make sure writing back all data in a cache line unit aligned to the largest cache line size in the cache hierarchy unless the programmer is sure that portion of the cache unit covering the largest cache line size, when not written back, contain no dirty data.

### **Operations:**

```
If (SubType is not supported)
  Exception (Reserved Instruction);
If (RaType(SubType) == VA) {
  VA = (Ra);
  PA = AddressTranslation(VA);
  VA cache control(SubType, PA, level);
} else {
  \{ Idx, way, word \} = (Ra)
  If (Op(SubType) == CacheTagRead) {
      Rt = Idx cache tag read(SubType, Idx, way);
  } else if (Op(SubType) == CacheDataRead) {
      Rt = Idx cache data read(SubType, Idx, way, word);
  } else if (Op(SubType) == CacheWrite) {
      Idx cache write(SubType, Rb, Idx, way, word);
  } else {
      Idx cache control(SubType, Idx, way);
  }
}
```

#### **Exceptions:**

TLB fill exception, Non-Leaf PTE not present exception, Leaf PTE not present exception, Read protection violation exception, TLB VLPT miss exception, Imprecise bus error exception, Reserved instruction exception, Privileged Instruction.



When executed under the *user operating mode*, the following four CCTL subtypes will generate exceptions if the page access permission is not setup correctly.

| CCTL Subtypes  | Required Permission | Generated exception             |
|----------------|---------------------|---------------------------------|
|                | under user mode     |                                 |
| L1D_VA_INVAL   | Read                | Data write protection violation |
| L1D_VA_WB      | Write               | Data write protection violation |
| L1D_VA_WBINVAL | Write               | Data write protection violation |
| L1I_VA_INVAL   | Read                | Data read protection violation  |

When executed under the *superuser operating mode*, all the CCTL subtypes will not generate exceptions due to the page read/write/execute permission.

As for the checking and generation of page modified exception and access bit exception, the following table lists the expected behaviors:

| CCTL     | Page modified  | Access bit     | Non-executable page      |
|----------|----------------|----------------|--------------------------|
| Subtypes | exception      | exception      | exception under all mode |
|          | under all mode | under all mode |                          |
| L1D_*    | Ignore         | Ignore         | N/A                      |
| L1I_*    | Ignore         | Ignore         | Ignore                   |

Privilege level: Depends on operation types.

Note:

- (1) A CCTL instruction should operate on the specified cache regardless of the cache enable/disable control bits in the Cache Control register provided the specified cache is implemented.
- (2) All non-instruction-fetch-related exceptions generated by a CCTL instruction should have the INST field of the ITYPE register set to 0.
- (3) For normal run time operations (thus excluding CCTL write tag operation), the state transition of the lock flag of a cache line affected by a CCTL instruction is illustrated in the following diagram:









## **CMOVN (Conditional Move on Not Zero)**

Type: 32-Bit Baseline Format:

| 3 | 1 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 5     | 4   | 0     |  |
|---|---|--------|----|----|----|----|----|----|----|---|-------|-----|-------|--|
| ( | ) | ALU_1  |    | F  | Rt |    | Ra |    | Rb |   | 00000 |     | CMOVN |  |
|   |   | 100000 |    |    |    |    |    |    |    |   |       | 110 | 011   |  |

Syntax: CMOVN Rt, Ra, Rb

Purpose: Move the content of a register based on a condition stored in a register.

**Description:** If the content of Rb is not equal to zero, then move the content of Ra into Rt. **Operations:** 

```
if (Rb != 0) {
    Rt = Ra;
}
```

Exceptions: None Privilege level: All Note:



## **CMOVZ (Conditional Move on Zero)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24   | 20 | 19 | 15 | 14 | 10 | 9 | 5     | 4   | 0     |  |
|----|--------|----|------|----|----|----|----|----|---|-------|-----|-------|--|
| 0  | ALU_1  |    | 1 Rt |    | Ra |    | R  | Rb |   | 00000 |     | CMOVZ |  |
|    | 100000 |    |      |    |    |    |    |    |   |       | 11( | 010   |  |

Syntax: CMOVZ Rt, Ra, Rb

**Purpose:** Move the content of a register based on a condition stored in a register.

**Description:** If the content of Rb is equal to zero, then move the content of Ra into Rt. **Operations:** 

```
if (Rb == 0) {
    Rt = Ra;
}
```

Exceptions: None Privilege level: All Note:



## **DIV (Unsigned Integer Divide)**

**Type:** 32-Bit Baseline Optional **Format:** 

| 31 | 30      | 25  | 24 | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9 | 6    | 5   | 0   |  |
|----|---------|-----|----|----|----|----|----|----|----|----|---|------|-----|-----|--|
| 0  | ) ALU_2 |     | 00 | )0 | Dt | 0  | R  | Ra |    | Rb |   | 0000 |     | DIV |  |
|    | 1000    | 001 |    |    |    |    |    |    |    |    |   |      | 101 | 111 |  |

Syntax: DIV Dt, Ra, Rb

**Purpose:** Divide the unsigned integer contents of two 32-bit registers.

**Description:** Divide the 32-bit content of Ra with the 32-bit content of Rb. The 32-bit quotient result is written to Dt.LO register and the 32-bit remainder result is written to Dt.HI register. The contents of Ra and Rb are treated as unsigned integers.

If the content of Rb is zero, an Arithmetic exception will be generated if the IDIVZE bit of the INT\_MASK register is 1, which enables exception generation for the "Divide-By-Zero" condition.

### **Operations:**

```
If (Rb != 0) {
   quotient = Floor(CONCAT(1`b0,Ra) / CONCAT(1`b0,Rb));
   remainder = CONCAT(1`b0,Ra) mod CONCAT(1`b0,Rb);
   Dt.LO = quotient;
   Dt.HI = remainder;
} else if (INT_MASK.IDIVZE == 0) {
   Dt.LO = 0;
   Dt.HI = 0;
} else {
   Generate_Exception(Arithmetic);
}
```

Exceptions: Arithmetic Privilege level: All Note:



## **DIVS (Signed Integer Divide)**

Type: 32-Bit Baseline Optional Format:

| 31 | 30      | 25  | 24    | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5    | 0   |
|----|---------|-----|-------|----|----|----|----|----|----|----|------|---|------|-----|
| 0  | 0 ALU_2 |     | 2 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | DIVS |     |
|    | 100     | 001 |       |    |    |    |    |    |    |    |      |   | 101  | 110 |

Syntax: DIVS Dt, Ra, Rb

Purpose: Divide the signed integer contents of two 32-bit registers.

**Description:** Divide the 32-bit content of Ra with the 32-bit content of Rb. The 32-bit quotient result is written to Dt.LO register and the 32-bit remainder result is written to Dt.HI register. The contents of Ra and Rb are treated as signed integers.

If the content of Rb is zero, an Arithmetic exception will be generated if the IDIVZE bit of the INT\_MASK register is 1, which enables exception generation for the "Divide-By-Zero" condition. If the quotient overflows, an Arithmetic exception will always be generated. The overflow condition is as follows:

Positive quotient > 0x7FFF FFFF (When Ra = 0x80000000 and Rb = 0xFFFFFFFF)

### **Operations:**

```
If (Rb != 0) {
   quotient = Floor(Ra / Rb);
   if (IsPositive(quotient) && quotient > 0x7FFFFFF) {
      Generate_Exception(Arithmetic);
   }
   remainder = Ra mod Rb
   Dt.LO = quotient;
   Dt.HI = remainder;
} else if (INT_MASK.IDIVZE == 0) {
   Dt.LO = 0;
   Dt.HI = 0;
} else {
   Generate Exception(Arithmetic);
```



}

Exceptions: Arithmetic Privilege level: All Note:

# **DPREF/DPREFI (Data Prefetch)**

|    | DPREF  |     |    |     |      |    |    |    |    |    |     |       |
|----|--------|-----|----|-----|------|----|----|----|----|----|-----|-------|
| 31 | 30     | 25  | 24 | 23  | 20   | 19 | 15 | 14 | 10 | 98 | 7   | 0     |
| 0  | MF     | EM  | 0  | Sub | Гуре | R  | Ra |    | Rb |    | DP  | REF   |
|    | 011    | 100 |    |     |      |    |    |    |    |    | 000 | 10011 |
|    |        |     |    |     |      |    |    |    |    |    |     |       |
|    | DPREFI |     |    |     |      |    |    |    |    |    |     |       |
| 31 | 30     | 25  | 24 | 23  | 20   | 19 | 15 |    | 14 |    |     | 0     |
|    |        |     |    |     |      |    |    |    |    |    |     |       |

| 31 | 30 25  | 24 | 23 20   | 19 15 | 14     | 0 |
|----|--------|----|---------|-------|--------|---|
| 0  | DPREFI | d  | SubType | Ra    | imm15s |   |
|    | 010011 |    |         |       |        |   |

Syntax: DPREF SubType, [Ra + (Rb << shift2)] DPREFI.d SubType, [Ra + (imm15s.000)] DPREFI.w SubType, [Ra + (imm15s.00)]

**Purpose:** Hint to move data from memory into data caches in advance before the actual load or store operations to reduce memory access latency.

## **Description:**

The effective byte address calculated from the data prefetch instruction (DPREF/DPREFI) is used by an implementation to take an implementation-dependent action which is expected to increase performance by moving the memory line containing the address from memory into data caches in advance. However, an implementation may decide to do nothing at any stage by treating this instruction as a NOP. As a hint, this instruction should not generate any observable results or any exceptions which alter the behavior of a program (e.g. an address exception which leads to OS aborting the program.) Note that the "cache line write" hint subtype may be an exception from the above statement. Please see the detailed subtype descriptions followed.

For this instruction to be effective in increasing performance, the data prefetch instruction should be implemented non-blocking to overlap with other instructions. And the actual number of cache lines and memory hierarchies affected are implementation-dependent.

The effective byte address for DPREF is Ra+(Rb<<shift2) where shift2 is a 2-bits shift amount for Rb.

The effective byte address for DPREFI is defined in two flavors, one for word and one for double word. The double word flavor has a byte address of Ra+SignExtend(imm15s.000) where imm15s represents an 8-byte-double-word offset. This coarser-grain immediate offset will not sacrifice the data prefetching performance since most of the cache block moves between the memory and data cache is greater than 8 bytes. The word flavor has a byte address of Ra+SignExtend(imm15s.00) where imm15s represents a 4-byte-word offset. This finer-grain immediate offset may give hardware a chance to optimize the data prefetch instruction by sub-blocking the data cache and filling the critical word of a prefetched cache line first. These two flavors are distinguished by the "d" bit (bit 24) in the instruction encoding as follows:

| "d" bit | Meaning  |
|---------|----------|
| 0       | DPREFI.w |
| 1       | DPREFI.d |

The SubType field of this instruction is used as a hint to tell hardware the intended use of the prefetched data, so that the hardware implementation may use different prefetch schemes to optimize the performance. The definition of the SubType field is listed in the following table.

| Sub  | Mnemonics | Hint           | Descripton                                      |
|------|-----------|----------------|-------------------------------------------------|
| Туре |           |                |                                                 |
| 0    | SRD       | Single Read    | The data will be read for only once. (i.e. no   |
|      |           |                | temporal locality)                              |
| 1    | MRD       | Multiple Read  | The data will be read multiple times. (i.e.     |
|      |           |                | has temporal locality)                          |
| 2    | SWR       | Single Write   | The data will be written for only once. (i.e.   |
|      |           |                | no temporal locality)                           |
| 3    | MWR       | Multiple Write | The data will be written multiple times.        |
|      |           |                | (i.e. has temporal locality)                    |
| 4    | РТЕ       | PTE Preload    | Preload page translation information into       |
|      |           |                | TLB.                                            |
| 5    | CLWR      | Cache Line     | The whole cache line will be written by         |
|      |           | Write          | store instructions. If the cache line is in the |
|      |           |                | data cache and is permitted for writing, the    |
|      |           |                | processor will do nothing. If the cache line    |



|      |   |         | is in the data cache but is not permitted for |
|------|---|---------|-----------------------------------------------|
|      |   |         | writing, the processor may request the        |
|      |   |         | write permission for the cache line. If the   |
|      |   |         | cache line is not in the data cache, the      |
|      |   |         | cache line may be allocated in the data       |
|      |   |         | cache with all zero data and write            |
|      |   |         | permission. This instruction may alter        |
|      |   |         | memory states of the cache line if the        |
|      |   |         | program fails to write the whole cache line   |
|      |   |         | with new data later. And the final memory     |
|      |   |         | states will be UNPREDICTABLE in this          |
|      |   |         | case depending on cache hit or miss when      |
|      |   |         | this instruction is issued. This memory       |
|      |   |         | states altering characteristic is             |
|      |   |         | implementation-dependent.                     |
| 6-15 | - | Imp-dep | Implementation dependent                      |

The data prefetch instruction will not prefetch memory locations with uncached attribute.

#### **Operations:**

```
VA = Ra+(Rb << shift2); // DPREF
VA = Ra+(imm15s.000); // DPREFI.d
VA = Ra+(imm15s.00); // DPREFI.w
PA_found = MMU_Search(VA);
If (PA_found) {
    (PA, attribute) = MMU_Translate(VA);
    Data_Prefetch(PA, attribute, hint);
} else if (hint == "TLB preload")
    TLB_Preload(VA);
```

#### **Exceptions:**

None, (Implementation-dependent: imprecise Cache error or imprecise Bus error)

### Privilege level: All

#### Note:

(1) The effective address of this data prefetch instruction does not need to be aligned to



any data type boundary (e.g. word-aligned) since the address is used to indicate a cache line rather than a specific data type.

- (2) This data prefetch instruction does not generate any exception. And if a hardware page table walker is implemented, it is suggested that this instruction should not generate any hardware page table walk memory access as well.
- (3) This data prefetch instruction may be used ahead of the real memory access instruction and be generated unguarded inside a loop. So it is very likely that a data prefetch instruction may generate an address which is outside of legal data range. Generating hardware page table walk memory access on a data prefetch instruction with such data address will cause unnecessary execution penalty which deter the usability of this instruction.



# **DSB (Data Serialization Barrier)**

#### Format:

| 31 | 30 25  | 24 20 | 19                                      | 5 | 4    | 0 |
|----|--------|-------|-----------------------------------------|---|------|---|
| 0  | MISC   | 00000 | 000000000000000000000000000000000000000 | 0 | DSE  | 3 |
|    | 110010 |       |                                         |   | 0100 | 0 |

### Syntax: DSB

**Purpose:** It is used to serialize a read-after-write data dependency for certain architecture/hardware states updates which affect data processing related operations. It guarantees a modified architecture or hardware state can be seen by any following dependent data operations.

#### **Description:**

This instruction blocks the execution of any subsequent instructions until all previously modified architecture/hardware states can be observed by subsequent dependent data operations in a pipelined processor. This instruction is not needed to serialize general register states which are serialized by hardware. For user programs, an endian mode change operation (SETEND) requires a DSB instruction to make sure the endian change is seen by any following load/store instructions. Other than this case, all uses of DSB are in privileged programs for managing system states.

## **Operations:**

Serialize\_Data\_States()

Exceptions: None Privilege level: All Note:

1. The following table lists some of the state writers and readers where DSB is needed between them in order for the readers to get the expected new value/behaviors.



| State           | Writer               | Reader       | Value              |
|-----------------|----------------------|--------------|--------------------|
| System register | MTSR                 | MFSR         | New SR value       |
| PSW.BE          | SETEND               | load/store   | New endian state   |
| PSW.DT          | MTSR                 | load/store   | New translation    |
|                 |                      |              | behavior           |
| PSW.GIE/ SIM/   | MTSR                 | following    | Interrupt behavior |
| H5IM-H0IM       |                      | instructions |                    |
| PSW.GIE         | SETGIE               | following    | New global         |
|                 |                      | instructions | interrupt enable   |
|                 |                      |              | state              |
| PSW.INTL        | MTSR                 | following    | Interruption stack |
|                 |                      | instructions | behavior           |
| D\$ line valid  | CCTL L1D (sub=inval, | load/store   | D\$ hit/miss       |
|                 | WB&inval, WR tag)    |              |                    |
|                 |                      | CCTL L1D     | Tag valid          |
|                 |                      | (sub=RD tag) |                    |
| DTLB            | TLBOP RWR/ RWLK/     | load/store   | Data PA            |
|                 | TWR                  |              |                    |
|                 | TLBOP FLUA/INV       | load/store   | TLB miss           |
|                 | TLBOP TWR            | TLBOP TRD    | TLB entry data     |
| CACHE_CTL       | MTSR                 | load/store   | D\$ enable/disable |
| (DC_ENA)        |                      |              | behavior           |
| DLMB (EN)       | MTSR                 | load/store   | Data local memory  |
|                 |                      |              | access behavior    |
| L1_PPTB         | MTSR                 | load/store   | HPTWK behavior     |
| TLB_ACC_XXX     | TLBOP TRD            | MFSR         | TLB read out       |
| DMA channel     | MTSR DMA_CHNSEL      | MTSR/MFSR    | New DMA channel    |
| selection       |                      | DMA channel  | number             |
| (DMA_CHNSEL)    |                      | registers    |                    |

2. PSW.GIE is defined to control if an asynchronous interrupt can be inserted in between two instructions or not. So when we say that an instruction will observe an updated PSW.GIE value, we mean that the updated PSW.GIE value will affect if the instruction and its next instruction can be interrupted or not. Based on this definition, the DSB instruction after a PSW.GIE update operation will not guaranteed itself to



observe the updated PSW.GIE value and may observe the old PSW.GIE value. So if the PSW.GIE is being updated to 0 to disable interrupt, an interrupt can still be inserted between the DSB and the immediate following instruction (inst1). The DSB instruction only guarantees that the following instructions after it (inst1, inst2, etc.) will observe the updated PSW.GIE value to be 0, thus preventing any interrupt inserted between the immediate following instruction (inst1) and its next instruction (inst2), and onward.

```
setgie.d
dsb
inst1
inst2
```

In addition to PSW.GIE, the same definition also works for the following interrupt masking fields:

INT\_MASK.SIM INT\_MASK.H5IM-H0IM



## **IRET (Interruption Return)**

#### Format:

| 31 | 30 25  | 24 20 | 19 5           | 4 0   |  |
|----|--------|-------|----------------|-------|--|
| 0  | MISC   | 00000 | 00000000000000 | IRET  |  |
|    | 110010 |       |                | 00100 |  |

### Syntax: IRET

**Purpose:** It is used to return from interruption to the instruction and states when the processor was being interrupted.

**Description:** The main function of the IRET instruction is to conditionally update (pop) the system register and program counter stack such that the processor behavior after the IRET instruction will return to a state when the processor was being interrupted. To be more specific, the following states will be updated based on interruption level (INTL) conditions:

- For INTL=0, 1 PC  $\leftarrow$  IPC PSW  $\leftarrow$  IPSW
- For INTL=2

 $PC \leftarrow IPC$  $IPC \leftarrow P_IPC$  $PSW \leftarrow IPSW$  $IPSW \leftarrow P_IPSW$  $EVA \leftarrow P_EVA$  $P0 \leftarrow P_P0$  $P1 \leftarrow P_P1$  $ITYPE \leftarrow P_ITYPE$  $EXCI \leftarrow P_EXCI$ 

• For INTL=3

 $PC \leftarrow OIPC$  $PSW.INTL \leftarrow 2$ 



Since these processor states update will affect processor fetching and data processing behaviors, in addition to the register stack update, instruction and data serialization operations are also included in the IRET instruction such that any instruction or instruction fetching after the IRET will guarantee to see the newly updated processor states caused by the IRET instruction.

### **Operations:**

```
If (PSW.INTL==0 or PSW.INTL ==1 or PSW.INTL==2) {
    PC = IPC;
    PSW = IPSW;
    If (PSW.INTL==2) {
      IPC = P_IPC;
      IPSW = P IPSW;
      EVA = P_EVA;
      PO = P PO;
      P1 = P P1;
      ITYPE = P ITYPE;
      EXCI = P EXCI;
    }
} else {
    PC = OIPC;
    PSW.INTL = 2;
}
Serialize Data States()
Serialize Instruction States()
```

**Exceptions:** Privileged Instruction **Privilege level:** Superuser and above **Note:** 



## **ISB (Instruction Serialization Barrier)**

#### Format:

| 31 | 30 25  | 24 20 | 19 5           | 4 0   |  |
|----|--------|-------|----------------|-------|--|
| 0  | MISC   | 00000 | 00000000000000 | ISB   |  |
|    | 110010 |       |                | 01001 |  |

### Syntax: ISB

**Purpose:** It is used to serialize a read-after-write data dependency for certain architecture/hardware states updates which affect instruction fetching related operations. It also serializes all architecture/hardware states updates that are serialized by a DSB instruction. It guarantees a modified architecture or hardware state can be seen by any following dependent instruction fetching operations and data processing operations.

### **Description:**

This instruction blocks the execution of any subsequent instructions until all previously modified architecture/hardware states can be observed by subsequent dependent (1) instruction fetching operations and (2) data processing operations that are serialized by a DSB instruction in a pipelined processor. This instruction is not needed to serialize general register states which are serialized by hardware.

The interruption return instruction (IRET) includes implicit ISB operation (including data serialization). So using either IRET or ISB instruction will achieve the expected instruction and data serialization behavior.

### **Operations:**

```
Serialize_Data_States()
Serialize_Instruction_States()
```

Exceptions: None Privilege level: All Note:



The following table lists some of the state writers and readers where ISB is needed between them in order for the readers to get the expected new value/behaviors.

| State              | Writer        | Reader            | Value                    |
|--------------------|---------------|-------------------|--------------------------|
| PSW.IT             | MTSR          | Instruction fetch | Instruction translation  |
|                    |               |                   | behavior                 |
| ITLB               | TLBOP RWR/    | Instruction fetch | Instruction PA           |
|                    | RWLK/ TWR     |                   |                          |
|                    | TLBOP FLUA/   | Instruction fetch | TLB miss                 |
|                    | INV           |                   |                          |
| I\$ line           | ISYNC         | Instruction fetch | New instruction data     |
| valid/data         |               |                   |                          |
| I\$ line valid     | CCTL L1I      | Instruction fetch | I\$ miss                 |
|                    | (sub=inval)   |                   |                          |
| I\$ line valid/tag | CCTL L1I      | CCTL L1I (sub=Rd  | I\$ hit                  |
|                    | (sub=WR tag)  | tag)              |                          |
| I\$ line data      | CCTL L1I      | CCTL L1I (sub=RD  | Instruction data         |
|                    | (sub=WR word) | word)             |                          |
| Memory             | MSYNC         | Instruction fetch | New instruction data     |
| CACHE_CTL          | MTSR          | Instruction fetch | I\$ enable/disable       |
| (IC_ENA)           |               |                   | behavior                 |
| ILMB (EN)          | MTSR          | Instruction fetch | Instruction local memory |
|                    |               |                   | access behavior          |
| L1_PPTB            | MTSR          | Instruction fetch | HPTWK behavior           |
| IVB                | MTSR          | Instruction fetch | Interruption behavior    |



## **ISYNC (Instruction Data Coherence Synchronization)**

#### Format:

| 31 | 30 25   | 24 20 | 19 5             | 4 0   |
|----|---------|-------|------------------|-------|
| 0  | MISC Ra |       | 0000000000000000 | ISYNC |
|    | 110010  |       |                  | 01101 |

### Syntax: ISYNC Ra

**Purpose:** Make sure instruction data updated in the cache line address specified by Ra can be observed by the instruction fetch event performed after an instruction serialization barrier instruction (ISB or IRET).

#### **Description:**

This instruction is used to guarantee that any instruction data updated before this instruction can be properly observed by an instruction fetch event after an instruction serialization barrier instruction. This instruction alleviates the Andes architecture implementation from implementing coherence logic between instruction and data caches in order to handle self-modifying code.

Software is required to use this instruction and the correct instruction sequence, after generating new instruction data, to synchronize the updated data between I and D caches in order to correctly execute the newly modified instruction later defined after an instruction serialization barrier instruction. If this instruction and the correct instruction sequence are not used in the above situation, the Andes implementation may still fetch and execute the old instruction data after any period of time.

After updating instruction data, this instruction initiates a data cache write-back operation and an instruction cache invalidating operation. And particularly, this instruction guarantees that the data write back operation has an address space access order (relative to this processor) BEFORE any subsequent instruction fetch operation (including speculative fetches) to the same cache-line address. The subsequent instruction serialization barrier instruction ensures that all instructions following it are re-fetched into the processor execution unit.



The content of Ra specifies the virtual address of the data cache line which contains the instruction data. It goes through the same address mapping mechanism as those associated with load and store instructions. Thus, address translation and protection exceptions could happen for this instruction.

### **Operation:**

```
VA = (Ra)
PA = AddressTranslation(VA)
SynchronizeAllCaches(PA)
```

### **Exceptions:**

TLB fill exception, Non-Leaf PTE not present exception, Leaf PTE not present exception, Read protection violation exception, TLB VLPT miss exception, Imprecise bus error exception, Machine check exception

### Privilege level: All

#### Notes:

- 1. If this instruction affects a locked cache line in the instruction cache, the affected cache line will be unlocked.
- 2. The processor behavior is **UNPREDICTABLE** if the VA of this instruction points to any cache line that contains instructions between this instruction and the next instruction serialization barrier instruction.
- **3.** The effective address of this instruction does not need to be aligned to any data type boundary (e.g. word-aligned) since the address is used to indicate a cache line rather than a specific data type. Thus, no Data Alignment Check exception is generated for this instruction.
- 4. The correct instruction sequence for writing or updating any code data that will be executed afterwards is as follows, except for AndesCore N1213 hardcore: N1213\_43U1H

```
UPD_LOOP:
    // preparing new code data in Ra
    .....
    // preparing new code address in Rb, Rc
    .....
```



```
// writing new code data
store Ra,[Rb,Rc]
// looping control
.....
bne UPD_LOOP
ISYNC_LOOP:
// preparing new code address in Rd
isync Rd
// looping control
bne ISYNC_LOOP
isb
// execution of new code data can be started from here
.....
```

For AndesCore N12 hardcore (N1213\_43U1H), the correct instruction sequence is as follows:

```
UPD LOOP:
   // preparing new code data in Ra
   . . . . . .
   // preparing new code address in Rb, Rc
   . . . . . .
   // writing new code data
   store Ra,[Rb,Rc]
   // looping control
   . . . . . .
   bne Rb, Re, UPD_LOOP
WB LOOP:
   // preparing new code address in Rd
   isync Rd (or cctl Rd, L1D_VA_WB)
   // looping control
   bne Rd, Re, WB LOOP
   msync
   isb
ICACHE_INV_LOOP:
   // preparing new code address in Rf
```



cctl Rf, L1I\_VA\_INVAL
// looping control
bne Rf,Re,ICACHE\_INV\_LOOP
isb
// execution of new code data can be started from here
.....



# J (Jump)

Type: 32-Bit Baseline Format:

| 31 | 30       | 25 | 24 | 23 |        | 0 |
|----|----------|----|----|----|--------|---|
| 0  | JI J     |    | J  |    | imm24s |   |
|    | 100100 0 |    |    |    |        |   |

### Syntax: J imm24s

**Purpose:** Unconditional branch relative to current instruction.

**Description:** Branch unconditionally to a PC-relative region defined by sign-extended (imm24s <<1) where the final branch address is half-word aligned. The branch range is ± 16M bytes.

#### **Operations:**

PC = PC + SE(imm24s << 1);

Exceptions: None Privilege level: All Note:

The assembled/disassembled instruction format displayed by tools may be different than the encoding syntax shown here, please consult the assembly programming guide or disassembler manual to get the correct meaning of the displayed syntax.



# JAL (Jump and Link)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 25  | 24  | 23 |        | 0 |
|----|--------|-----|----|--------|---|
| 0  | JI     | JAL |    | imm24s |   |
|    | 100100 | 1   |    |        |   |

### Syntax: JAL imm24s

Purpose: Unconditional function call relative to current instruction.

**Description:** Branch unconditionally to a PC-relative region defined by sign-extended (imm24s <<1) where the final branch address is half-word aligned. The branch range is  $\pm$  16M bytes. The program address of the next sequential instruction (PC+4) is written to R30 (Link Pointer register) for function call return purpose.

### **Operations:**

R30 = PC + 4; PC = PC + SE(imm24s << 1);

## Exceptions: None

### Privilege level: All

#### Note:

The assembled/disassembled instruction format displayed by tools may be different than the encoding syntax shown here, please consult the assembly programming guide or disassembler manual to get the correct meaning of the displayed syntax.



# JR (Jump Register)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25 | 24    | 15          | 14 | 10    | 9 | 8  | 7 | 6       | 5 | 4   | 0   |
|----|--------|----|-------|-------------|----|-------|---|----|---|---------|---|-----|-----|
| 0  | JREG   |    | 00000 | 00000000 Rb |    | DT/IT |   | 00 |   | JR hint | J | R   |     |
|    | 100101 |    |       |             |    |       | 0 | 0  |   |         | 0 | 000 | 000 |

Syntax: JR Rb

**Purpose:** Unconditional branch to an instruction address stored in a register.

**Description:** Branch unconditionally to an instruction address stored in Rb. The JR hint field is used to distinguish this instruction from the RET instruction which has the same architecture behavior but different software usages.

#### **Operations:**

PC = Rb;

Exceptions: None Privilege level: All Note:



## JR.xTOFF (Jump Register and Translation OFF)

**Type:** 32-Bit Baseline (with MMU configuration) **Format:** 

| 31 | 30   | 25 | 24    | 15           | 14 | 10 | 9        | 8 | 7       | 6 | 5 | 4   | 0   |
|----|------|----|-------|--------------|----|----|----------|---|---------|---|---|-----|-----|
| 0  | JRE  | G  | 00000 | 000000000 Rb |    | DT | DT/IT 00 |   | JR hint | J | R |     |     |
|    | 1001 | 01 |       |              |    |    | 0        | 1 |         |   | 0 | 000 | 000 |

**IR ITOFF** 

| 31 | 30     | 25 | 24           | 15 | 14 | 10 | 9        | 8 | 7 | 6       | 5 | 4   | 0   |
|----|--------|----|--------------|----|----|----|----------|---|---|---------|---|-----|-----|
| 0  | JREG   |    | 000000000 Rb |    | b  | DT | DT/IT 00 |   | 0 | JR hint | J | R   |     |
|    | 100101 |    |              |    |    |    | 1        | 1 |   |         | 0 | 000 | 000 |

**IR TOFF** 

## Syntax: JR.[T | IT]OFF Rb

**Purpose:** Unconditional branch to an instruction address stored in a register and turn off address translation for the target instruction.

**Description:** Branch unconditionally to an instruction address stored in Rb and also clears the IT (and DT if included) field of the Processor Status Word (PSW) system register to turn off the instruction (and data if included) address translation process in the memory management unit. This instruction guarantees that fetching of the target instruction will see PSW.IT as 0 (and PSW.DT as 0 if included), thus will not go through the address translation process. The JR hint field is used to distinguish this instruction from the RET.xTOFF instruction which has the same architecture behavior but different software usages.

### **Operations:**

```
PC = Rb;
PSW.IT = 0;
if (INST(9) == 1) {
    PSW.DT = 0;
}
```

**Exceptions:** Privileged Instruction, Reserved Instruction (for non-MMU configuration) **Privilege level:** Superuser and above

**Note:** This instruction is used in an interruption handler or privileged code in a translated address space to return to a place which is in a non-translated address space. Please see



JRAL.xTON instruction for the reverse process of coming from a non-translated address space to a translated address space.



## JRAL (Jump Register and Link)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30   | 25  | 24 | 20 | 19    | 15 | 14 | 10 | 98    | 7  | 5  | 4   | 0   |
|----|------|-----|----|----|-------|----|----|----|-------|----|----|-----|-----|
| 0  | JREG |     | Rt |    | 00000 |    | Rb |    | DT/IT | 00 | 00 | JR  | AL  |
|    | 100  | 101 |    |    |       |    |    |    | 00    |    |    | 000 | 001 |

**Syntax:** JRAL Rb (implied Rt == R30, Link Pointer register)

JRAL Rt, Rb

**Purpose:** Unconditional function call to an instruction address stored in a register. **Description:** Branch unconditionally to an instruction address stored in Rb. The program address of the next sequential instruction (PC+4) is written to Rt for function call return purpose.

### **Operations:**

```
jaddr = Rb;
Rt = PC + 4;
PC = jaddr;
```

Exceptions: None Privilege level: All Note:



## JRAL.xTON (Jump Register and Link and Translation ON)

**Type:** 32-Bit Baseline (with MMU configuration) **Format:** 

|    |        |    |    |    | و     |    |    |    |       |   |    |     |     |
|----|--------|----|----|----|-------|----|----|----|-------|---|----|-----|-----|
| 31 | 30 2   | 25 | 24 | 20 | 19    | 15 | 14 | 10 | 98    | 7 | 5  | 4   | 0   |
| 0  | JREG   |    | Rt |    | 00000 |    | Rb |    | DT/IT | 0 | 00 | JR  | AL  |
|    | 100101 |    |    |    |       |    |    |    | 01    |   |    | 000 | 001 |
|    |        |    |    |    |       |    |    |    |       |   |    |     |     |

IDAL ITON

#### JRAL.TON

| 31 | 30   | 25  | 24 | 20 | 19    | 15 | 14 | 10 | 98    | 7  | 5  | 4   | 0   |
|----|------|-----|----|----|-------|----|----|----|-------|----|----|-----|-----|
| 0  | JREG |     | Rt |    | 00000 |    | Rb |    | DT/IT | 00 | 00 | JR  | AL  |
|    | 000  | 101 |    |    |       |    |    |    | 11    |    |    | 000 | 001 |

Syntax: JRAL.[T | IT]ON Rb (implied Rt == R30, Link Pointer register) JRAL.[T | IT]ON Rt, Rb

**Purpose:** Unconditional function call to an instruction address stored in a register and turn on address translation for the target instruction.

**Description:** Branch unconditionally to an instruction address stored in Rb and also sets the IT (and DT if included) fields of the Processor Status Word (PSW) system register to turn on the instruction (and data if included) address translation process in the memory management unit. The program address of the next sequential instruction (PC+4) is written to Rt for function call return purpose. This instruction guarantees that fetching of the target instruction will see PSW.IT as 1 (and PSW.DT as 1 if included), thus will go through the address translation process.

#### **Operations:**

```
jaddr = Rb
Rt = PC + 4;
PC = jaddr;
PSW.IT = 1;
if (INST(9) == 1) {
    PSW.DT = 1;
}
```



**Exceptions:** Privileged Instruction, Reserved Instruction (for non-MMU configuration) **Privilege level:** Superuser and above

**Note:** This instruction is used in an interruption handler or privileged code in a non-translated address space to jump to a function which is in a translated address space. Please see JR.xTOFF/RET.xTOFF instruction for the reverse process of returning from a translated address space to a non-translated address space.

# LB (Load Byte)

**Type:** 32-Bit Baseline **Format:** 

|    |     | LB     |    |      |    |    |    |    |    |      |       |
|----|-----|--------|----|------|----|----|----|----|----|------|-------|
| 31 | 30  | 25     | 24 | 20   | 19 | 15 | 14 | 10 | 98 | 7    | 0     |
| 0  | MI  | MEM Rt |    | Ra   |    | Rb |    | SV | LB |      |       |
|    | 011 | 100    |    |      |    |    |    |    |    | 0000 | 00000 |
|    |     |        |    |      |    |    |    |    |    |      |       |
|    |     | LB.    | bi |      |    |    |    |    |    |      |       |
| 31 | 30  | 25     | 24 | 20   | 19 | 15 | 14 | 10 | 98 | 7    | 0     |
| 0  | MI  | EM     | F  | Lt . | R  | a  | R  | b  | SV | LF   | 3.bi  |
|    | 011 | 100    |    |      |    |    |    |    |    | 0000 | 00100 |

| Syntax: | LB    | Rt, [Ra + (Rb << sv)] |
|---------|-------|-----------------------|
|         | LB.bi | Rt, [Ra], (Rb << sv)  |

**Purpose:** To load a zero-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a zero-extended byte from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the  $Ra + (Rb \ll sv)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format.

## **Operations:**

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
```



```
Bdata(7,0) = Load_Memory(PAddr, Byte, Attributes);
Rt = Zero_Extend(Bdata(7,0));
If (.bi form) { Ra = Addr; }
} else {
Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



# LBI (Load Byte Immediate)

**Type:** 32-Bit Baseline **Format:** 

000100

|        |     | LBI  |    |    |    |    |    |        |   |
|--------|-----|------|----|----|----|----|----|--------|---|
| 31     | 30  | 25   | 24 | 20 | 19 | 15 | 14 |        | 0 |
| 0      | LBI |      | Rt |    | Ra |    |    | imm15s |   |
|        | 000 | 000  |    |    |    |    |    |        |   |
|        |     |      |    |    |    |    |    |        |   |
| LBI.bi |     |      |    |    |    |    |    |        |   |
| 31     | 30  | 25   | 24 | 20 | 19 | 15 | 14 |        | 0 |
| 0      | LB  | I.bi | F  | Rt | R  | a  |    | imm15s |   |

| Syntax: | LBI    | Rt, [Ra + imm15s] |
|---------|--------|-------------------|
|         | LBI.bi | Rt, [Ra], imm15s  |

**Purpose:** To load a zero-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a zero-extended byte from the memory into the general

register Rt. Two different forms are used to specify the memory address. The regular form uses Ra + SE(imm15s) as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + SE(imm15s) value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format. Note that imm15s is treated as a signed integer.

### **Operations:**

```
Addr = Ra + Sign_Extend(imm15s);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
```



```
Bdata(7,0) = Load_Memory(PAddr, BYTE, Attributes);
Rt = Zero_Extend(Bdata(7,0));
If (.bi form) { Ra = Addr; }
} else {
Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error.

## Privilege level: All

### Note:

 "LBI R0, [R0+0]" baseline version 2 special behavior: This instruction will become a Reserved instruction when the INT\_MASK.ALZ (INT\_MASK[29]) is set to one. INT\_MASK is also named as ir14. This special behavior can be used to debug a system. When this special behavior is used, compiler and assembler should avoid generating this instruction.


# LBS (Load Byte Signed)

**Type:** 32-Bit Baseline **Format:** 

|    |        | LBS    | 5    |    |    |    |    |        |      |      |      |
|----|--------|--------|------|----|----|----|----|--------|------|------|------|
| 31 | 30     | 25     | 24   | 20 | 19 | 15 | 14 | 10     | 98   | 7    | 0    |
| 0  | Mł     | MEM Rt |      | Rt | Ra |    | Rb |        | SV   | LBS  |      |
|    | 011100 |        |      |    |    |    |    | 0001   | 0000 |      |      |
|    |        |        |      |    |    |    |    |        |      |      |      |
|    |        | LBS    | S.bi |    |    |    |    |        |      |      |      |
| 31 | 30     | 25     | 24   | 20 | 19 | 15 | 14 | 10     | 98   | 7    | 0    |
| 0  | MEM Rt |        | Ra   |    | R  | b  | SV | LBS.bi |      |      |      |
|    | 011    | 100    |      |    |    |    |    |        |      | 0001 | 0100 |

| Syntax: | LBS    | Rt, [Ra + (Rb << sv)] |
|---------|--------|-----------------------|
|         | LBS.bi | Rt, [Ra], (Rb << sv)  |

**Purpose:** To load a sign-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a sign-extended byte from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the  $Ra + (Rb \ll sv)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format.

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
```



```
Bdata(7,0) = Load_Memory(PAddr, Byte, Attributes);
Rt = Sign_Extend(Bdata(7,0));
If (.bi form) { Ra = Addr; }
} else {
Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



## LBSI (Load Byte Signed Immediate)

**Type:** 32-Bit Baseline **Format:** 

|    |        | LBI |    |    |    |    |        |  |   |
|----|--------|-----|----|----|----|----|--------|--|---|
| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14     |  | 0 |
| 0  | LBSI   |     | F  | Rt | Ra |    | imm15s |  |   |
|    | 010    | 000 |    |    |    |    |        |  |   |
|    |        |     |    |    |    |    |        |  |   |
|    | LBI.bi |     |    |    |    |    |        |  |   |
| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14     |  | 0 |
|    |        |     |    |    |    |    |        |  |   |

| 51 | 50 25   | 2. 20 | 19 10 | 11 0   |   |
|----|---------|-------|-------|--------|---|
| 0  | LBSI.bi | Rt    | Ra    | imm15s |   |
|    | 010100  |       |       |        |   |
|    |         |       |       |        | 1 |

| Syntax: | LBSI    | Rt, [Ra + imm15s] |  |  |  |  |
|---------|---------|-------------------|--|--|--|--|
|         | LBSI.bi | Rt, [Ra], imm15s  |  |  |  |  |

**Purpose:** To load a sign-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a sign-extended byte from the memory into the general

register Rt. Two different forms are used to specify the memory address. The regular form uses Ra + SE(imm15s) as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + SE(imm15s) value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format. Note that imm15 is treated as a signed integer.

```
Addr = Ra + Sign_Extend(imm15s);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep status == NO EXCEPTION) {
```



```
Bdata(7,0) = Load_Memory(PAddr, BYTE, Attributes);
Rt = Sign_Extend(Bdata(7,0));
If (.bi form) { Ra = Addr; }
} else {
Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



# LH (Load Halfword)

**Type:** 32-Bit Baseline **Format:** 

|    |        | LH     |    |    |    |    |    |      |       |      |      |
|----|--------|--------|----|----|----|----|----|------|-------|------|------|
| 31 | 30     | 25     | 24 | 20 | 19 | 15 | 14 | 10   | 98    | 7    | 0    |
| 0  | MEI    | M      | F  | lt | Ra |    | Rb |      | SV    | LH   |      |
|    | 011100 |        |    |    |    |    |    | 0000 | 00001 |      |      |
|    |        |        |    |    |    |    |    |      |       |      |      |
|    | LH.bi  |        |    |    |    |    |    |      |       |      |      |
| 31 | 30     | 25     | 24 | 20 | 19 | 15 | 14 | 10   | 98    | 7    | 0    |
| 0  | ME     | MEM Rt |    | R  | Ra |    | b  | SV   | LH.bi |      |      |
|    | 0111   | 00     |    |    |    |    |    |      |       | 0000 | 0101 |

| Syntax: | LH    | Rt, [Ra + (Rb << sv)] |
|---------|-------|-----------------------|
|         | LH.bi | Rt, [Ra], (Rb << sv)  |

**Purpose:** To load a zero-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a zero-extended halfword from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the  $Ra + (Rb \ll sv)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format.

The memory address has to be halfword-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
```



```
if (!Halfword_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
   Rt = Zero_Extend(Hdata(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



# LHI (Load Halfword Immediate)

**Type:** 32-Bit Baseline **Format:** 

|    |     | LHI |     |    |    |    |    |        |   |
|----|-----|-----|-----|----|----|----|----|--------|---|
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 |        | 0 |
| 0  | LHI |     | F   | Rt | Ra |    |    | imm15s |   |
|    | 000 | 001 |     |    |    |    |    |        |   |
|    |     | LHI | .bi |    |    |    |    |        |   |
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 |        | 0 |

| 51 | 50 25  | 24 20 | 17 15 | 14     | 0 |
|----|--------|-------|-------|--------|---|
| 0  | LHI.bi | Rt    | Ra    | imm15s |   |
|    | 000101 |       |       |        |   |

| Syntax: | LHI    | Rt, [Ra + (imm15s << 1)] |
|---------|--------|--------------------------|
|         | LHI.bi | Rt, [Ra], (imm15s << 1)  |

(imm15s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To load a zero-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a zero-extended halfword from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses Ra + SE(imm15s << 1) as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + SE(imm15s << 1) value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format. Note that imm15s is treated as a signed integer.

The memory address has to be half-word-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + Sign_Extend(imm15s << 1);
If (.bi form) {
    Vaddr = Ra;
} else {</pre>
```



```
Vaddr = Addr;
}
if (!Halfword_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
   Rt = Zero_Extend(Hdata(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



# LHS (Load Halfword Signed)

**Type:** 32-Bit Baseline **Format:** 

|    |        | LHS    | 5  |    |    |    |    |    |        |      |      |
|----|--------|--------|----|----|----|----|----|----|--------|------|------|
| 31 | 30     | 25     | 24 | 20 | 19 | 15 | 14 | 10 | 98     | 7    | 0    |
| 0  | MEM    | MEM Rt |    | Ra |    | Rb |    | sv | LHS    |      |      |
|    | 01110  | 00     |    |    |    |    |    |    |        | 0001 | 0001 |
|    |        |        |    |    |    |    |    |    |        |      |      |
|    | LHS.bi |        |    |    |    |    |    |    |        |      |      |
| 31 | 30     | 25     | 24 | 20 | 19 | 15 | 14 | 10 | 98     | 7    | 0    |
| 0  | MEM Rt |        | Rt | Ra |    | R  | Rb |    | LHS.bi |      |      |
|    | 01110  | 00     |    |    |    |    |    |    |        | 0001 | 0101 |

| Syntax: | LHS    | Rt, [Ra + (Rb << sv)] |
|---------|--------|-----------------------|
|         | LHS.bi | Rt, [Ra], (Rb << sv)  |

**Purpose:** To load a sign-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a sign-extended halfword from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the  $Ra + (Rb \ll sv)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format.

The memory address has to be halfword-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
```



```
if (!Halfword_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
   Rt = Sign_Extend(Hdata(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## LHSI (Load Halfword Signed Immediate)

**Type:** 32-Bit Baseline **Format:** 

|    |         | LHS | I    |    |    |    |    |        |   |
|----|---------|-----|------|----|----|----|----|--------|---|
| 31 | 30      | 25  | 24   | 20 | 19 | 15 | 14 |        | 0 |
| 0  | LH      | ISI | R    | Rt | Ra |    |    | imm15s |   |
|    | 010     | 001 |      |    |    |    |    |        |   |
|    |         |     |      |    |    |    |    |        |   |
|    |         | LHS | I.bi |    |    |    |    |        |   |
| 31 | 30 25   |     | 24   | 20 | 19 | 15 | 14 |        | 0 |
| 0  | LHSI.bi |     | R    | Rt | Ra |    |    | imm15s |   |
|    | 010     | 101 |      |    |    |    |    |        |   |

| Syntax: | LHSI    | Rt, [Ra + (imm15s << 1)] |
|---------|---------|--------------------------|
|         | LHSI.bi | Rt, [Ra], (imm15s << 1)  |

(imm15s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To load a sign-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a sign-extended halfword from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses Ra + SE(imm15s << 1) as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + SE(imm15s << 1) value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format. Note that imm15s is treated as a signed integer.

The memory address has to be half-word-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + Sign_Extend(imm15s << 1);
If (.bi form) {
    Vaddr = Ra;
} else {</pre>
```



```
Vaddr = Addr;
}
if (!Halfword_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
   Rt = Sign_Extend(Hdata(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## LLW (Load Locked Word)

#### Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |
|----|--------|-------|-------|-------|----|----------|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | LLW      |
|    | 011100 |       |       |       |    | 00011000 |

Syntax: LLW Rt, [Ra + (Rb << sv)]

**Purpose:** It is used as a primitive to perform atomic read-modify-write operations. **Description:** The LLW and SCW instructions are basic interlocking primitives to perform an atomic read (load-locked), modify, and write (store-conditional) sequence.

> LLW Rx ... Modifying Rx SCW Rx BEQZ Rx

A LLW instruction begins the sequence and a SCW instruction completes the sequence. If this sequence can be performed without any intervening interruption or an interfering write from another processor or I/O module, then the SCW instruction succeeds. Otherwise the SCW instruction fails and the program has to retry the sequence. There can only be one such active read-modify-write sequence exists per processor at any one time. And if a new LLW instruction is issued before an active sequence is completed by a SCW instruction, the new LLW instruction will start a new sequence which replaces the previous sequence.

The LLW instruction loads an aligned 32-bit word from a word-aligned memory address calculated by adding Ra and ( $Rb \ll sv$ ). The word from memory is loaded into register Rt. When a LLW instruction is executed without generating any exceptions, the processor remembers the loaded physical word address (Locked Physical Address) and sets a per-processor lock flag.

If the lock flag is still set when a SCW instruction is executed and the stored physical address is the same as the aligned address of the remembered LLW physical address, the store happens; otherwise, the store does not occur. And the success or failure status is stored back into the source register (Please see SCW instruction description for detailed



### definition.)

The per-processor lock flag is cleared if the following events happen:

- Any execution of an IRET instruction.
- A coherent store is completed by another processor or coherent I/O module to the "Lock Region" containing the Locked Physical Address. The definition of the "Lock Region" is an aligned power-of-2 bytes memory region and its exact size is implementation-dependent, but within the range of at least 4-byte and at most the default minimum page size. The coherency is enforced either by hardware coherent mechanisms or by software using CCTL instructions on this processor through an interrupt mechanism. The coherent store event can be caused by a regular store, a store\_conditional, and DPREF/Cache-Line-Write instructions.
- The completion of a SCW instruction on all success or fail conditions.

If there is a memory access or CCTL instruction between the execution of LLW and SCW, the SCW may fail or success. Portable software should avoid putting memory access or CCTL instructions between the execution of LLW and SCW instructions. (For example, a store word operation to the same physical address of the Locked Physical Address.)

### **Operations:**

```
VA = Ra + Rb;
If (VA(1,0) != 0) {
   Generate_Exception(Data_alignment_check);
}
(PA, Attributes) = Address_Translation(VA, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Rt = Load_Memory(PA, WORD, Attributes);
   Locked_Physical_Address = PA;
   Lock_Flag = 1;
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: Alignment check, TLB fill, Non-leaf PTE not present, Leaf PTE not present,



Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Usage (Note):

A very long instruction sequence between LLW and SCW may always fail the SCW instruction due to periodic timer interrupt. Software should take this into consideration when constructing LLW and SCW instruction sequences.

### Additional Software Constraints:

For N1213 hardcore N1213\_43U1HA0 (CPU\_VER==0x0C010003), additional software constraints must be followed to ensure correct LLW/SCW operations.

- If LLW/SCW are used in an interruption handler, it must be followed that
  - Execution of a LLW instruction must lead to execution of a SCW instruction. UPREDICTABLE result may happen if execution of a LLW instruction eventually leads to execution of an IRET instruction without going through a SCW instruction.



## LMW (Load Multiple Word)

#### Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 10 | 9   | 6    | 5   | 4   | 3   | 2 | 10 |
|----|--------|----|----|----|----|----|-------|-----|------|-----|-----|-----|---|----|
| 0  | LSMW   |    | R  | b  | R  | a  | Re    | Ena | ble4 | LMW | b:0 | i:0 | m | 00 |
|    | 011101 |    |    |    |    |    |       |     |      | 0   | a:1 | d:1 |   |    |

**Syntax:** LMW. {b| a} {i | d} {m?} Rb, [Ra], Re, Enable4

- **Purpose:** Load multiple 32-bit words from sequential memory locations into multiple registers.
- **Description:** load multiple 32-bit words from sequential memory addresses specified by the base address register Ra and the {b | a} {i | d} options into a continuous range or a subset of general-purpose registers specified by a registers list formed by Rb, Re, and the four-bit Enable4 field as follows.

<Registers List> = a range from [Rb, Re] and a list from <Enable4>

- {i | d} option specifies the direction of the address change. {i} generates increasing addresses from Ra and {d} generates decreasing addresses from Ra.
- {b | a} option specifies the way how the first address is generated. {b} use the contents of Ra as the first memory load address. {a} use either Ra+4 or Ra-4 for the {i | d} option respectively as the first memory load address.
- {m?} option, if it is specified, indicates that the base address register will be updated to the value computed in the following formula at the completion of this instruction.

TNReg = Total number of registers loaded Updated value = Ra + (4 \* TNReg) for {i} option Updated value = Ra - (4 \* TNReg) for {d} option

 [Rb, Re] specifies a range of registers which will be loaded by this instruction. Rb(4,0) specifies the first register number in the continuous register range and Re(4,0) specifies the last register number in this register range. In addition to the range of registers, <Enable4(3,0)> specifies the load of 4 individual registers from R28 to R31 (s9/fp, gp, lp, sp) which have special calling convention usage. The exact mapping of Enable4(3,0) bits and registers is as follows:



| Bits      | Enable4(3) | Enable4(2) | Enable4(1) | Enable4(0) |
|-----------|------------|------------|------------|------------|
|           | Format(9)  | Format(8)  | Format(7)  | Format(6)  |
| Registers | R28        | R29        | R30        | R31        |

- Several constraints are imposed for the <Registers List>:
  - If [Rb(4,0), Re(4,0)] specifies at least one register:
    - $Rb(4,0) \le Re(4,0) AND$
    - $0 \le Rb(4,0), Re(4,0) \le 28$
  - If [Rb(4,0), Re(4,0)] specifies no register at all:
    - Rb(4,0) == Re(4,0) = 0b11111 AND
    - Enable4(3,0) != 0b0000
  - If these constraints are not met, UNPREDICTABLE result will happen to the contents of all registers after this instruction.
- The registers are loaded in sequence from matching memory locations. That is, the lowest-numbered register is loaded from the lowest memory address while the highest-numbered register is loaded from the highest memory address.
- If the base address register update {m?} option is specified while the base address register Ra is also specified in the <Register Specification>, there are two source values for the final content of the base address register Ra. In this case, the final value of Ra is UNPREDICTABLE. And the rest of the loaded registers should have values as if the base address register update {m?} option is not specified.
- This instruction can handle aligned/unaligned memory address.

```
TNReg = Count_Registers(register_list);
if ("bi") {
    B_addr = Ra;
    E_addr = Ra + (TNReg * 4) - 4;
} elseif ("ai") {
    B_addr = Ra + 4;
    E_addr = Ra + (TNReg * 4);
} elseif ("bd") {
    B_addr = Ra - (TNReg * 4) + 4;
    E_addr = Ra;
} else { // "ad"
```



```
B addr = Ra - (TNReg * 4);
   E addr = Ra - 4
}
VA = B_addr;
for (i = 0 \text{ to } 31) {
   if (register list[i] == 1) {
      (PA, Attributes) = Address Translation(VA, PSW.DT);
      Excep status = Page Exception(Attributes, UserMode, LOAD);
      If (Excep status == NO EXCEPTION) {
         Ri = Load Memory(PA, Word, Attributes);
         VA = VA + 4;
      } else {
         Generate Exception(Excep status);
      }
   }
}
if ("im") {
   Ra = Ra + (TNReg * 4);
} else { // "dm"
  Ra = Ra - (TNReg * 4);
}
```

**Exception:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error.

• If the base register update is not specified, the base register value is unchanged. This applies even if the instruction loaded its own base register and the memory access to load the base register occurred earlier than the exception event. For example, suppose the instruction is

LMW.bd R2, [R4], R4, 0b0000

And the implementation loads R4, then R3, and finally R2. If an exception occurs on any of the accesses, the value in the base register R4 of the instruction is unchanged.

- If the base register update is specified, the value left in the base register is unchanged.
- If the instruction loads only one general-purpose register, the value in that register is unchanged.
- If the instruction loads more than one general-purpose register,



UNPREDICTABLE values are left in destination registers which are not the base register of the instruction.

**Interruption:** Whether this instruction is interruptible or not is implementation-dependent.

#### Privilege Level: all

#### Note:

- (1) LMW and SMW instructions do not guarantee atomicity among individual memory access operations. And they do not guarantee single access to a memory location during the execution either. Any I/O access that has side-effects other than simple stable memory-like access behavior should not use these two instructions.
- (2) The memory access order among the words accessed by LMW/SMW is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i : increasing memory addresses from base address.
  - For LMW/SMW.d: decreasing memory addresses from base address.
- (3) The memory access order within an un-aligned word accessed is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i: the aligned low address of the word and then the aligned high address of the word. If an interruption occurs, the EVA register will contain the starting low address of the un-aligned word or .
  - For LMW/SMW.d: the aligned high address of the word and then the aligned low address of the word. If an interruption occurs, the EVA register will contain "base un-aligned address + 4" of the first word or the starting low address of the remaining decreasing memory word.
- (4) Based on the more likely access order of (2) and (3), upon interruption, the EVA register for un-aligned LMW/SMW will more likely have the following value:
  - For LMW/SMW.i: the starting low addresses of the accessed words or "Ra + (TNReg \* 4)" where TNReg represents the total number of registers loaded or stored.
  - For LMW/SMW.d: the starting low addresses of the accessed words or "Ra + 4".

# LW (Load Word)

**Type:** 32-Bit Baseline **Format:** 

|    |        | LW  |     |    |    |    |    |    |    |      |       |
|----|--------|-----|-----|----|----|----|----|----|----|------|-------|
| 31 | 30     | 25  | 24  | 20 | 19 | 15 | 14 | 10 | 98 | 7    | 0     |
| 0  | MI     | EM  | Rt  |    | Ra |    | R  | Rb |    | LW   |       |
|    | 011    | 100 |     |    |    |    |    |    |    | 0000 | 00010 |
|    |        |     |     |    |    |    |    |    |    |      |       |
|    |        | LW  | .bi |    |    |    |    |    |    |      |       |
| 31 | 30 25  |     | 24  | 20 | 19 | 15 | 14 | 10 | 98 | 7    | 0     |
| 0  | MEM    |     | F   | Rt | Ra |    | R  | Rb |    | LV   | V.bi  |
|    | 011100 |     |     |    |    |    |    |    |    | 0000 | 0110  |

| Syntax: | LW    | Rt, [Ra + (Rb << sv)] |
|---------|-------|-----------------------|
|         | LW.bi | Rt, [Ra], (Rb << sv)  |

Purpose: To load a 32-bit word from memory into a general register.

**Description:** This instruction loads a word from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses Ra +  $(Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra +  $(Rb \ll sv)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
```



```
if (!Word_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
   Rt = Wdata(31,0);
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## LWI (Load Word Immediate)

**Type:** 32-Bit Baseline **Format:** 

000110

|    |               | LWI              |     |    |    |    |    |        |   |  |  |
|----|---------------|------------------|-----|----|----|----|----|--------|---|--|--|
| 31 | 30            | 25               | 24  | 20 | 19 | 15 | 14 |        | 0 |  |  |
| 0  | LWI<br>000010 |                  | F   | Rt |    | a  |    | imm15s |   |  |  |
|    | 000           | LWI Kt<br>000010 |     |    |    |    |    |        |   |  |  |
|    |               |                  |     |    |    |    |    |        |   |  |  |
|    |               | LWI              | .bi |    |    |    |    |        |   |  |  |
| 31 | 30            | 25               | 24  | 20 | 19 | 15 | 14 |        | 0 |  |  |
| 0  | LW            | I.bi             | F   | Rt | R  | a  |    | imm15s |   |  |  |

| Syntax: | LWI    | Rt, [Ra + (imm15s << 2)] |
|---------|--------|--------------------------|
|         | LWI.bi | Rt, [Ra], (imm15s << 2)  |

(imm15s is a word offset. In assembly programming, always write a byte offset.)

Purpose: To load a 32-bit word from memory into a general register.

**Description:** This instruction loads a word from the memory into the general register Rt. Two different forms are used to specify the memory address. The regular form uses Ra +  $SE(imm15s \ll 2)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra +  $SE(imm15s \ll 2)$  value after the memory load operation. And UNPREDICTABLE result will be written to Rt if Rt is specified as equal to Ra in the instruction format. Note that imm15s is treated as a signed integer.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

```
Addr = Ra + Sign_Extend(imm15s << 2);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;</pre>
```



```
}
if (!Word_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
   Rt = Wdata(31,0);
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All **Note:** 



## LWUP (Load Word with User Privilege Translation)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |
|----|--------|-------|-------|-------|----|----------|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | LWUP     |
|    | 011100 |       |       |       |    | 00100010 |

**Syntax:** LWUP Rt,  $[Ra + (Rb \ll sv)]$ 

**Purpose:** To load a 32-bit word from memory into a general register with the user mode privilege address translation.

**Description:** This instruction loads a word from the memory address Ra + (Rb << sv) into the general register Rt with the user mode privilege address translation regardless of the current processor operation mode (i.e. PSW.POM) and the current data address translation state (i.e. PSW.DT). The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

### **Operations:**

```
Vaddr = Ra + (Rb << sv);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, TRANSLATE);
Excep_status = Page_Exception(Attributes, UserMode, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
    Rt = Wdata(31,0);
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All

Note:





## MADD32 (Multiply and Add to Data Low)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24     | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5      | 0   |
|----|--------|----|--------|----|----|----|----|----|----|----|------|---|--------|-----|
| 0  | ALU_2  |    | _2 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MADD32 |     |
|    | 100001 |    |        |    |    |    |    |    |    |    |      |   | 110    | 011 |

Syntax: MADD32 Dt, Ra, Rb

**Purpose:** Multiply the contents of two 32-bit registers and add the lower 32-bit multiplication result with the lower 32-bit content of a 64-bit data register. The final result is written back to the lower 32-bit of the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit multiplication result is added with the content of Dt.LO 32-bit data register. And the final result is written back to Dt.LO data register. The contents of Ra and Rb can be either signed or unsigned integers.

### **Operations:**

Mresult = Ra \* Rb; Dt.LO = Dt.LO + Mresult(31,0);



## MADD64 (Multiply and Add Unsigned)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24  | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5      | 0   |
|----|--------|----|-----|----|----|----|----|----|----|----|------|---|--------|-----|
| 0  | ALU_2  |    | 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MADD64 |     |
|    | 100001 |    |     |    |    |    |    |    |    |    |      |   | 101    | 011 |

Syntax: MADD64 Dt, Ra, Rb

**Purpose:** Multiply the unsigned integer contents of two 32-bit registers and add the multiplication result with the content of a 64-bit data register. The final result is written back to the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is added with the content of Dt data register. And the final result is written back to Dt data register. The contents of Ra and Rb are treated as unsigned integers.

### **Operations:**

```
Mresult = CONCAT(1`b0,Ra) * CONCAT(1`b0,Rb);
Dt = Dt + Mresult(63,0);
```



## MADDS64 (Multiply and Add Signed)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24  | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5       | 0   |
|----|--------|----|-----|----|----|----|----|----|----|----|------|---|---------|-----|
| 0  | ALU_2  |    | 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MADDS64 |     |
|    | 100001 |    |     |    |    |    |    |    |    |    |      |   | 101     | 010 |

Syntax: MADDS64 Dt, Ra, Rb

**Purpose:** Multiply the signed integer contents of two 32-bit registers and add the multiplication result with the content of a 64-bit data register. The final result is written back to the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is added with the content of Dt data register. And the final result is written back to Dt data register. The contents of Ra and Rb are treated as signed integers.

#### **Operations:**

Mresult = Ra \* Rb; Dt = Dt + Mresult(63,0);



### MFSR (Move From System Register)

#### Format:

| 31 | 30 25  | 24 20 | 19    | 10  | 9   | 5    | 4   | 0   |
|----|--------|-------|-------|-----|-----|------|-----|-----|
| 0  | MISC   | Rt    | SRIDX | 000 | 000 | MFSR |     |     |
|    | 110010 |       |       |     |     |      | 000 | 010 |

### **Syntax:** MFSR Rt, SRIDX

**Purpose:** It is used to move the content of a system register into a general register.

#### **Description:**

The content of the system register specified by the SRIDX will be moved into the general register Rt.

### **Operations:**

GR[Rt] = SR[SRIDX];

Exceptions: Privileged Instruction

**Privilege level:** Superuser and above **Note:** 



### **MFUSR (Move From User Special Register)**

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 96   | 5 0    |  |
|----|--------|-------|-------|-------|------|--------|--|
| 0  | ALU_2  | Rt    | USR   | Group | 0000 | MFUSR  |  |
|    | 100001 |       |       |       |      | 100000 |  |

**Syntax:** MFUSR Rt, USR\_Name (= USR, Group)

**Purpose:** Move the content of a User Special Register to a general register.

**Description:** The content of a User Special Register specified by USR and Group is moved into a general register Rt. The USR definition is defined in the following tables.

| Group | USR value | User Special Register              |
|-------|-----------|------------------------------------|
| 0     | 0         | D0.LO                              |
| 0     | 1         | D0.HI                              |
| 0     | 2         | D1.LO                              |
| 0     | 3         | D1.HI                              |
| 0     | 30-4      | Reserved                           |
| 0     | 31        | PC                                 |
|       |           | (The PC value of this instruction) |

Table 52Group 0 MFUSR definitions

Reading of group 1 registers in USER mode requires permission from PRIVILEGED mode resources (i.e. PRUSR\_ACC\_CTL register). If the reading permission is not enabled in USER mode, reading such a register will generate Privileged Instruction exception. Privileged software should provide means for a user mode program to request such access permission. Reading of reserved registers will cause Reserved Instruction exception. Please check Andes Privileged Architecture specification for detailed definitions of Group 1 USR registers.

Table 53Group 1 MFUSR definitions

| Group | USR value | User Special Register |
|-------|-----------|-----------------------|
| 1     | 0         | DMA_CFG               |



| 1 | 1     | DMA_GCSW                            |
|---|-------|-------------------------------------|
| 1 | 2     | DMA_CHNSEL                          |
| 1 | 3     | DMA_ACT                             |
|   |       | (Write only register, Read As Zero) |
| 1 | 4     | DMA_SETUP                           |
| 1 | 5     | DMA_ISADDR                          |
| 1 | 6     | DMA_ESADDR                          |
| 1 | 7     | DMA_TCNT                            |
| 1 | 8     | DMA_STATUS                          |
| 1 | 9     | DMA_2DSET                           |
| 1 | 10-24 | Reserved                            |
| 1 | 25    | DMA_2DSCTL                          |
| 1 | 26-31 | Reserved                            |

Reading of group 2 registers in USER mode requires permission from PRIVILEGED mode resources (i.e. PRUSR\_ACC\_CTL register). If the reading permission is not enabled in USER mode, reading such a register will generate Privileged Instruction exception. Privileged software should provide means for a user mode program to request such access permission. Reading of reserved registers will cause Reserved Instruction exception. Please check Andes Privileged Architecture specification for detailed definitions of Group 2 USR registers.

|       | 1         |                       |
|-------|-----------|-----------------------|
| Group | USR value | User Special Register |
| 2     | 0         | PFMC0                 |
| 2     | 1         | PFMC1                 |
| 2     | 2         | PFMC2                 |
| 2     | 3         | Reserved              |
| 2     | 4         | PFM_CTL               |
| 2     | 5-31      | Reserved              |

Table 54Group 2 MFUSR definitions

#### **Operations:**

Rt = User Special Register[Group][USR];



### Exceptions: Privileged Instruction, Reserved Instruction

#### Privilege level: All

#### Note:

- 1. For PC register, there is no corresponding "MTUSR Rt, PC" instruction.
- 2. PC-relative memory access operation can be synthesized using the following code sequences:

```
L1: mfusr Ra, PC
L2: lwi Rs, [Ra + (offset relative to L1)]
```



## **MOVI (Move Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 0   |
|----|--------|-------|--------|
| 0  | MOVI   | Rt    | imm20s |
|    | 100010 |       |        |

Syntax: MOVI Rt, imm20s

**Purpose:** To initialize a register with a constant.

**Description:** Move the sign-extended imm20s into general register Rt.

#### **Operations:**

Rt = SE(imm20s);



## MSUB32 (Multiply and Subtract to Data Low)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24  | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5      | 0   |
|----|--------|----|-----|----|----|----|----|----|----|----|------|---|--------|-----|
| 0  | ALU_2  |    | 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MSUB32 |     |
|    | 100001 |    |     |    |    |    |    |    |    |    |      |   | 110    | 101 |

Syntax: MSUB32 Dt, Ra, Rb

**Purpose:** Multiply the contents of two 32-bit registers and subtract the lower 32-bit multiplication result from the lower 32-bit content of a 64-bit data register. The final result is written back to the lower 32-bit of the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit multiplication result is subtracted from the content of Dt.LO 32-bit data register. And the final result is written back to Dt.LO data register. The contents of Ra and Rb can be either signed or unsigned integers.

#### **Operations:**

Mresult = Ra \* Rb; Dt.LO = Dt.LO - Mresult(31,0);



## MSUB64 (Multiply and Subtract Unsigned)

Type: 32-Bit Baseline Format:

| 31 | 30    | 25  | 24  | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5      | 0   |
|----|-------|-----|-----|----|----|----|----|----|----|----|------|---|--------|-----|
| 0  | ALU_2 |     | 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MSUB64 |     |
|    | 100   | 001 |     |    |    |    |    |    |    |    |      |   | 101    | 101 |

Syntax: MSUB64 Dt, Ra, Rb

**Purpose:** Multiply the unsigned integer contents of two 32-bit registers and subtract the multiplication result from the content of a 64-bit data register. The final result is written back to the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is subtracted from the content of Dt data register. And the final result is written back to Dt data register. The contents of Ra and Rb are treated as unsigned integers.

### **Operations:**

```
Mresult = CONCAT(1`b0,Ra) * CONCAT(1`b0,Rb);
Dt = Dt - Mresult(63,0);
```



## MSUBS64 (Multiply and Subtract Signed)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24  | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5       | 0   |
|----|--------|----|-----|----|----|----|----|----|----|----|------|---|---------|-----|
| 0  | ALU_2  |    | 000 |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MSUBS64 |     |
|    | 100001 |    |     |    |    |    |    |    |    |    |      |   | 101     | 100 |

Syntax: MSUBS64 Dt, Ra, Rb

**Purpose:** Multiply the signed integer contents of two 32-bit registers and subtract the multiplication result from the content of a 64-bit data register. The final result is written back to the 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is subtracted from the content of Dt data register. And the final result is written back to Dt data register. The contents of Ra and Rb are treated as signed integers.

#### **Operations:**

Mresult = Ra \* Rb; Dt = Dt - Mresult(63,0);


## **MSYNC (Memory Data Coherence Synchronization)**

#### Format:

| 31 | 30   | 25  | 24    | 20 | 19     | 8            | 7 | 5       | 4  | 0     |  |
|----|------|-----|-------|----|--------|--------------|---|---------|----|-------|--|
| 0  | MISC |     | 00000 |    | 000000 | 000000000000 |   | SubType |    | MSYNC |  |
|    | 110  | 010 |       |    |        |              |   |         | 01 | 100   |  |

### Syntax: MSYNC SubType

**Purpose:** This is a collection of Memory Barrier operations to ensure completion (locally or globally) of certain phase of memory load/store operations. This instruction is used to order loads and stores for synchronizing memory accesses between two and more Andes cores or an Andes core and the other Direct Memory Access agent.

### **Description:**

This instruction is used for any non-strongly ordered load and store operations where software wants to ensure certain memory access order from these operations assuming that the hardware implementation will not automatically ensure the required ordering behavior. Hardware implementation is free to enforce more ordering behavior. And in such a case, this instruction becomes a NOP.

The following table lists the MSYNC SubType definitions:

| SubType | Name     |
|---------|----------|
| 0       | All      |
| 1       | Store    |
| 2 - 7   | Reserved |

| Fable 55MSYN | NC SubType | definitions |
|--------------|------------|-------------|
|--------------|------------|-------------|

If a Reserved SubType is used in this instruction, a Reserved Instruction Exception will be generated.

### 1. SubType 0, All:

For an implementation which does not support coherent caches, this operation ensures that all loads and *non-dirty* stores before this instruction completes before all loads and stores



after this instruction can start. *Non-dirty* store includes non-cacheable store, and cacheable store which has been written back implicitly or explicitly.

For an implementation which support coherent caches, this operation ensures that all loads and stores before this instruction completes before all loads and stores after this instruction can start.

Completeness for a load means the destination register is written. Completeness for a store means the stored value is visible to all memory access agents in the system.

This operation does not enforce any ordering between load and store instructions and instruction fetches.

2. SubType 1, Store:

For an implementation which does not support coherent caches, this operation ensures that all *non-dirty* stores before this instruction completes before all loads and stores after this instruction can start. *Non-dirty* store includes non-cacheable store, and cacheable store which has been written back implicitly or explicitly.

For an implementation which support coherent caches, this operation ensures that all stores before this instruction completes before all loads and stores after this instruction can start.

Completeness for a store means the stored value is visible to all memory access agents in the system.

This operation does not enforce any ordering between load and store instructions and instruction fetches.

### **Operation:**

```
If (SubType is not supported) {
   Reserved_Instruction_Exception()
} else {
   MemoryDataSynchronization(SubType)
}
```

#### **Exceptions:**

**Detail Instruction Description** 



Reserved instruction exception

Privilege level: All

Notes:



## MTSR (Move To System Register)

### Format:

| 31 | 30 25  | 24 20 | 19    | 10 | 9   | 5   | 4   | 0   |
|----|--------|-------|-------|----|-----|-----|-----|-----|
| 0  | MISC   | Ra    | SRIDX |    | 000 | 000 | MT  | SR  |
|    | 110010 |       |       |    |     |     | 000 | )11 |

### **Syntax:** MTSR Ra, SRIDX

**Purpose:** It is used to move the content of a general register into a system register.

### **Description:**

The content of the general register Ra will be moved into the system register specified by the SRIDX.

#### **Operations:**

SR[SRIDX] = GR[Ra];

Exceptions: Privileged Instruction

**Privilege level:** Superuser and above **Note:** 



## MTUSR (Move To User Special Register)

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 96   | 5 0    |  |
|----|--------|-------|-------|-------|------|--------|--|
| 0  | ALU_2  | Rt    | USR   | Group | 0000 | MFUSR  |  |
|    | 100001 |       |       |       |      | 100001 |  |

**Syntax:** MTUSR Rt, USR\_Name (= USR, Group)

**Purpose:** Move the content of a general register to a User Special Register.

**Description:** The content of a general register Rt is moved into a User Special Register specified by USR and Group. The USR definition is defined in the following tables.

| Group | USR value | User Special Register |
|-------|-----------|-----------------------|
| 0     | 0         | D0.LO                 |
| 0     | 1         | D0.HI                 |
| 0     | 2         | D1.LO                 |
| 0     | 3         | D1.HI                 |
| 0     | 4-31      | Reserved              |

Table 56 Group 0 MTUSR definitions

Writing of group 1 registers in USER mode requires permission from PRIVILEGED mode resources (i.e. PRUSR\_ACC\_CTL register). If the writing permission is not enabled in USER mode, writing such a register will generate Privileged Instruction exception. Privileged software should provide means for a user mode program to request such access permission. Writing of reserved registers will cause Reserved Instruction exception. Please check Andes Privileged Architecture specification for detailed definitions of Group 1 USR registers. The data dependency serializations of group 1 registers between MTUSR DMA\_CHNSEL and MTUSR

Table 57Group 1 MTUSR definitionsGroupUSR valueUser Special Register



| 1 | 0     | DMA_CFG                    |
|---|-------|----------------------------|
|   |       | (Read only, Write ignored) |
| 1 | 1     | DMA_GCSW                   |
|   |       | (Read only, Write ignored) |
| 1 | 2     | DMA_CHNSEL                 |
| 1 | 3     | DMA_ACT                    |
| 1 | 4     | DMA_SETUP                  |
| 1 | 5     | DMA_ISADDR                 |
| 1 | 6     | DMA_ESADDR                 |
| 1 | 7     | DMA_TCNT                   |
| 1 | 8     | DMA_STATUS                 |
|   |       | (Read only, Write ignored) |
| 1 | 9     | DMA_2DSET                  |
| 1 | 10-24 | Reserved                   |
| 1 | 25    | DMA_2DSCTL                 |
| 1 | 26-31 | Reserved                   |

Writing of group 2 registers in USER mode requires permission from PRIVILEGED mode resources (i.e. PRUSR\_ACC\_CTL register). If the writing permission is not enabled in USER mode, writing such a register will generate Privileged Instruction exception. Privileged software should provide means for a user mode program to request such access permission. Writing of reserved registers will cause Reserved Instruction exception. Please check Andes Privileged Architecture specification for detailed definitions of Group 2 USR registers. The data dependency serializations of group 2 registers between MTUSR PFM\_CTL and MTUSR <PFMCx register> or between MTUSR and MFUSR requires DSB instruction inserted in the middle. Please see Andes Privileged Architecture specification for more details.

| Group | USR value | User Special Register |
|-------|-----------|-----------------------|
| 2     | 0         | PFMC0                 |
| 2     | 1         | PFMC1                 |
| 2     | 2         | PFMC2                 |
| 2     | 3         | Reserved              |
| 2     | 4         | PFM_CTL               |

Table 58Group 2 MTUSR definitions



| 2 5-31 | Reserved |
|--------|----------|
|--------|----------|

### **Operations:**

User\_Special\_Register[Group][USR] = Rt;

Exceptions: Privileged Instruction, Reserved Instruction Privilege level: All Note:



## MUL (Multiply Word to Register)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24       | 20 | 19 | 15 | 14 | 10 | 9 | 6    | 5   | 0   |  |
|----|--------|----|----------|----|----|----|----|----|---|------|-----|-----|--|
| 0  | ALU_2  |    | ALU_2 Rt |    | Ra |    | R  | Rb |   | 0000 |     | MUL |  |
|    | 100001 |    |          |    |    |    |    |    |   |      | 100 | 100 |  |

Syntax: MUL Rt, Ra, Rb

**Purpose:** Multiply the contents of two registers and write the result to a register. **Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit of the multiplication result is written to Rt. The contents of Ra and Rb can be signed or unsigned numbers.

#### **Operations:**

Mresult = Ra \* Rb; Rt = Mresult(31,0);



## MULT32 (Multiply Word to Data Low)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24         | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9    | 6 | 5      | 0    |
|----|--------|----|------------|----|----|----|----|----|----|----|------|---|--------|------|
| 0  | ALU_2  |    | LU_2 000 I |    | Dt | 0  | Ra |    | Rb |    | 0000 |   | MULT32 |      |
|    | 100001 |    |            |    |    |    |    |    |    |    |      |   | 110    | 0001 |

Syntax: MULT32 Dt, Ra, Rb

**Purpose:** Multiply the contents of two 32-bit registers and write the result to the lower 32-bit of a 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit multiplication result is written to Dt.LO 32-bit data register. The contents of Ra and Rb can be either signed or unsigned integers.

### **Operations:**

Mresult = Ra \* Rb; Dt.LO = Mresult(31,0);



## MULT64 (Multiply Word Unsigned)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24        | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9  | 6  | 5   | 0    |
|----|--------|----|-----------|----|----|----|----|----|----|----|----|----|-----|------|
| 0  | ALU_2  |    | ALU_2 000 |    | Dt | 0  | R  | a  | R  | b  | 00 | 00 | MU  | LT64 |
|    | 100001 |    |           |    |    |    |    |    |    |    |    |    | 101 | 001  |

Syntax: MULT64 Dt, Ra, Rb

**Purpose:** Multiply the unsigned integer contents of two 32-bit registers and write the result to a 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is written to Dt data register. The contents of Ra and Rb are treated as unsigned integers.

#### **Operations:**

```
Mresult = CONCAT(1`b0,Ra) * CONCAT(1`b0,Rb);
Dt = Mresult(63,0);
```



## MULTS64 (Multiply Word Signed)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 22 | 21 | 20 | 19 | 15 | 14 | 10 | 9  | 6  | 5   | 0    |
|----|--------|----|----|----|----|----|----|----|----|----|----|----|-----|------|
| 0  | ALU_2  |    | 00 | 00 | Dt | 0  | R  | a  | R  | b  | 00 | 00 | MUI | TS64 |
|    | 100001 |    |    |    |    |    |    |    |    |    |    |    | 101 | 000  |

Syntax: MULTS64 Dt, Ra, Rb

**Purpose:** Multiply the signed integer contents of two 32-bit registers and write the result to a 64-bit data register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is written to Dt data register. The contents of Ra and Rb are treated as signed integers.

### **Operations:**

Mresult = Ra \* Rb; Dt = Mresult(63,0);



# NOP (No Operation)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24  | 20  | 19    | 15 | 14    | 10 | 9   | 5   | 4   | 0   |
|----|--------|----|-----|-----|-------|----|-------|----|-----|-----|-----|-----|
| 0  | ALU_1  |    | NO  | OP  | NO    | OP | NO    | OP | 000 | 000 | SR  | LI  |
|    | 100000 |    | 000 | 000 | 00000 |    | 00000 |    |     |     | 010 | 001 |

Syntax: NOP

Purpose: Perform no operation.

**Description:** Do nothing. This instruction is aliased to "SRLI RO, RO, O", but will be treated by an implementation as a true NOP.

**Operations:** 

;



## NOR (Bit-wise Logical Nor)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|--------|----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU_1  |    | F  | Rt | R  | la | R  | b  | 000 | 000 | NC  | OR  |
|    | 100000 |    |    |    |    |    |    |    |     |     | 001 | 101 |

Syntax: NOR Rt, Ra, Rb

**Purpose:** Doing a bit-wise logical NOR operation on the content of two registers. **Description:** The content of Ra is combined with the content of Rb using a bit-wise logical NOR operation. And the result is written to Rt.

### **Operations:**

Rt =  $\sim$  (Ra | Rb);



## **OR (Bit-wise Logical Or)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|--------|----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU_1  |    | F  | Rt | R  | la | R  | b  | 000 | 000 | 0   | R   |
|    | 100000 |    |    |    |    |    |    |    |     |     | 001 | 100 |

Syntax: OR Rt, Ra, Rb

Purpose: Doing a bit-wise logical OR operation on the content of two registers.

**Description:** The content of Ra is combined with the content of Rb using a bit-wise logical OR operation. And the result is written to Rt.

**Operations:** 

Rt = Ra | Rb;



# **ORI (Or Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 |        | 0 |
|----|--------|----|----|----|----|----|----|--------|---|
| 0  | OR     | Ι  | R  | Rt | R  | la |    | imm15u |   |
|    | 101100 |    |    |    |    |    |    |        |   |

Syntax: ORI Rt, Ra, imm15u

Purpose: Bit-wise OR of the content of a register with an unsigned constant.

**Description:** The content of Ra is bit-wise ORed with the zero-extended imm15u. And the result is written to Rt.

#### **Operations:**

 $Rt = Ra \mid ZE(imm15u);$ 



## **RET (Return from Register)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30  | 25  | 24    | 15    | 14 | 10 | 9  | 8    | 7 | 6 | 5   |   | 4   | 0   |
|----|-----|-----|-------|-------|----|----|----|------|---|---|-----|---|-----|-----|
| 0  | JR  | EG  | 00000 | 00000 | R  | b  | DT | '/IT | 0 | 0 | RET | Г | JI  | R   |
|    | 100 | 101 |       |       |    |    | 0  | 0    |   |   | 1   |   | 000 | 000 |

### Syntax: RET Rb

**Purpose:** Unconditional function call return to an instruction address stored in a register. **Description:** Branch unconditionally to an instruction address stored in Rb. Note that the architecture behavior of this instruction is the same as the JR instruction. But software will use this instruction instead of JR for function call return purpose. This facilitates software's need to distinguish the two different usages which is helpful in call stack backtracing applications. Distinguishing a function return jump from a regular jump will also help on implementation performance (e.g. return address prediction).

#### **Operations:**

PC = Rb;



## **RET.xTOFF (Return from Register and Translation OFF)**

**Type:** 32-Bit Baseline (with MMU configuration) **Format:** 

|    |      |    |       |       |    |    | -  |      |   |   |     |     |     |
|----|------|----|-------|-------|----|----|----|------|---|---|-----|-----|-----|
| 31 | 30   | 25 | 24    | 15    | 14 | 10 | 9  | 8    | 7 | 6 | 5   | 4   | 0   |
| 0  | JRF  | EG | 00000 | 00000 | R  | b  | DT | T/IT | 0 | 0 | RET | JI  | R   |
|    | 1001 | 01 |       |       |    |    | 0  | 1    |   |   | 1   | 000 | 000 |

**RET ITOFF** 

#### **RET.TOFF**

| 31 | 30  | 25  | 24    | 15    | 14 | 10 | 9  | 8    | 7 | 6 | 5   | 4   | 0   |
|----|-----|-----|-------|-------|----|----|----|------|---|---|-----|-----|-----|
| 0  | JRI | EG  | 00000 | 00000 | R  | b  | DT | '/IT | 0 | 0 | RET | J   | R   |
|    | 100 | 101 |       |       |    |    | 1  | 1    |   |   | 1   | 000 | 000 |

### Syntax: RET.[T | IT]OFF Rb

**Purpose:** Unconditional function call return to an instruction address stored in a register and turn off address translation for the target instruction.

**Description:** Branch unconditionally to an instruction address stored in Rb and also clears the IT (and DT if included) field of the Processor Status Word (PSW) system register to turn off the instruction (and data if included) address translation process in the memory management unit. This instruction guarantees that fetching of the target instruction will see PSW.IT as 0 (and PSW.DT as 0 if included), thus will not go through the address translation process. Note that the architecture behavior of this instruction is the same as the JR.xTOFF instruction. But software will use this instruction instead of JR.xTOFF for function call return purpose. This facilitates software's need to distinguish the two different usages which is helpful in call stack backtracing applications. Distinguishing a function return jump from a regular jump will also help on implementation performance (e.g. return address prediction).

#### **Operations:**

```
PC = Rb;
PSW.IT = 0;
if (INST(9) == 1) {
    PSW.DT = 0;
}
```

Exceptions: Privileged Instruction, Reserved Instruction (for non-MMU configuration)



### Privilege level: Superuser and above

**Note:** This instruction is used in an interruption handler or privileged code in a translated address space to return to a place which is in a non-translated address space. Please see JRAL.xTON instruction for the reverse process of coming from a non-translated address space to a translated address space.



# **ROTR (Rotate Right)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4  | 0   |
|----|--------|----|----|----|----|----|----|----|-----|-----|----|-----|
| 0  | ALU_1  |    | F  | Rt | R  | la | R  | b  | 000 | 000 | RO | TR  |
|    | 100000 |    |    |    |    |    |    |    |     |     | 01 | 111 |

Syntax: ROTR Rt, Ra, Rb

Purpose: Perform right rotation operation on the content of a register.

**Description:** The content of Ra is right-rotated. The rotation amount is specified by the low-order 5-bits of the Rb register. And the result is written to Rt.

### **Operations:**

ra = Rb(4,0); Rt = CONCAT(Ra(ra-1,0), Ra(31,ra));

Exceptions: None

Privilege level: All Note:



## **ROTRI (Rotate Right Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14  | 10  | 9  | 5   | 4  | 0   |
|----|--------|----|----|----|----|----|-----|-----|----|-----|----|-----|
| 0  | ALU_1  |    | F  | Rt | R  | la | imr | n5u | 00 | 000 | RO | TRI |
|    | 100000 |    |    |    |    |    |     |     |    |     | 01 | 011 |

Syntax: ROTRI Rt, Ra, imm5u

Purpose: Perform right rotation operation on the content of a register.

**Description:** The content of Ra is right-rotated. The rotation amount is specified by the imm5u constant. And the result is written to Rt.

#### **Operations:**

Rt = CONCAT(Ra(imm5u-1,0), Ra(31,imm5u));

Exceptions: None Privilege level: All

Note:

# **SB (Store Byte)**

**Type:** 32-Bit Baseline **Format:** 

|    | SB     |        |       |       |    |          |
|----|--------|--------|-------|-------|----|----------|
| 31 | 30 25  | 24 20  | 19 15 | 14 10 | 98 | 7 0      |
| 0  | MEM    | Rt     | Ra    | Rb    | SV | SB       |
|    | 011100 |        |       |       |    | 00001000 |
|    | SB     | .bi    |       |       |    |          |
| 31 | 30 25  | 24 20  | 19 15 | 14 10 | 98 | 7 0      |
| 0  | MEM    | MEM Rt |       | Rb    | SV | SB.bi    |
|    | 011100 |        |       |       |    | 00001100 |

| Syntax: | SB    | Rt, [Ra + (Rb << sv)] |
|---------|-------|-----------------------|
|         | SB.bi | Rt, [Ra], (Rb << sv)  |

Purpose: To store an 8-bit byte from a general register into memory.

**Description:** The least-significant 8-bit byte in the general register Rt is stored to the memory. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + (Rb  $\ll$  sv) value after the memory store operation.

### **Operations:**

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store Memory(PAddr, BYTE, Attributes, Rt(7,0));
```



```
If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error Privilege level: All Note:



## **SBI (Store Byte Immediate)**

**Type:** 32-Bit Baseline **Format:** 

|    |        | SBI  |    |    |    |    |        |        |   |
|----|--------|------|----|----|----|----|--------|--------|---|
| 31 | 30     | 25   | 24 | 20 | 19 | 15 | 14     |        | 0 |
| 0  | SBI R  |      | Rt | Ra |    |    | imm15s |        |   |
|    | 001    | 000  |    |    |    |    |        |        |   |
|    |        |      |    |    |    |    |        |        |   |
|    |        | SBI. | bi |    |    |    |        |        |   |
| 31 | 30     | 25   | 24 | 20 | 19 | 15 | 14     |        | 0 |
| 0  | SBI.bi |      | F  | Rt | R  | a  |        | imm15s |   |
|    | 001100 |      |    |    |    |    |        |        |   |

| Syntax: | SBI    | Rt, [Ra + imm15s] |
|---------|--------|-------------------|
|         | SBI.bi | Rt, [Ra], imm15s  |

**Purpose:** To store an 8-bit byte from a general register into a memory location. **Description:** The least-significant 8-bit byte in the general register Rt is stored to the memory location. Two different forms are used to specify the memory address. The regular form uses Ra + SE(imm15s) as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + SE(imm15s) value after the memory store operation. Note that imm15s is treated as a signed integer.

### **Operations:**

```
Addr = Ra + Sign_Extend(imm15s);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store Memory(PAddr, BYTE, Attributes, Rt(7,0));
```



```
If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



## **SCW (Store Conditional Word)**

#### Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |
|----|--------|-------|-------|-------|----|----------|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | SCW      |
|    | 011100 |       |       |       |    | 00011001 |

**Syntax:** SCW Rt, [Ra + (Rb << sv)]

**Purpose:** It is used as a primitive to perform atomic read-modify-write operations. **Description:** The LLW and SCW instructions are basic interlocking primitives to perform an atomic read (load-locked), modify, and write (store-conditional) sequence.

> LLW Rx Modify Rx SCW Rx BEQZ Rx

A LLW instruction begins the sequence and A SCW instruction completes the sequence. If this sequence can be performed without any intervening interruption or an interfering write from another processor or I/O module, then the SCW instruction succeeds. Otherwise the SCW instruction fails and the program has to retry the sequence. There can only be one such active read-modify-write sequence exists per processor at any one time. And if a new LLW instruction is issued before an active sequence is completed by a SCW instruction, the new LLW instruction will start a new sequence which replaces the previous sequence.

The SCW instruction conditionally stores a 32-bit word from register Rt to a word-aligned memory address calculated by adding Ra and ( $Rb \ll sv$ ). If all the following conditions are true, then the memory store operation happens. And a result of 1 is written to the general register Rt to indicate a success status.

- The Lock\_Flag is 1 (Note: any exception generated by the SCW instruction will clear the Lock\_Flag)
- The word-aligned store physical address is the same as the aligned address of the Locked\_Physical\_Address generated by the previous LLW instruction.



If the Lock\_Flag is 0, then the store operation will not be performed. And a result of 0 is written to the general register Rt to indicate a fail status.

If the word-aligned store physical address is not the same as the aligned address of the Locked\_Physical\_Address generated by the previous LLW instruction, then whether the store operation will be performed or not is implementation-dependent (i.e. UNPREDICTABLE). However, the final success or fail status will be consistent with the implementation's action.

The per-processor lock flag is cleared if the following events happen:

- Any execution of an IRET instruction.
- A coherent store is completed by another processor or coherent I/O module to the "Lock Region" containing the Locked Physical Address. The definition of the "Lock Region" is an aligned power-of-2 bytes memory region and its exact size is implementation-dependent, but within the range of at least 4-byte and at most the default minimum page size. The coherency is enforced either by hardware coherent mechanisms or by software using CCTL instructions on this processor through an interrupt mechanism. The coherent store event can be caused by a regular store, a store\_conditional, and DPREF/Cache-Line-Write instructions.
- The completion of a SCW instruction on all success or fail conditions.

If there is a memory access or CCTL instruction between the execution of LLW and SCW, the SCW may fail or success. Portable software should avoid putting memory access or CCTL instructions between the execution of LLW and SCW instructions. (For example, a store word operation to the same physical address of the Locked Physical Address.)

### **Operations:**

```
VA = Ra + Rb;
If (VA(1,0) != 0) {
    Generate_Exception(Data_alignment_check);
}
(PA, Attributes) = Address_Translation(VA, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
    If (Lock Flag == 1)
```



```
If (PA == Locked_Physical_Address) {
    Store_Memory(PA, Attributes, Rt);
    Rt = 1;
    } else {
    Implementation-dependent for success or fail;
    }
    } else {
    Rt = 0;
    }
    Lock_Flag = 0;
} else {
    Generate_Exception(Excep_status);
}
```

Exceptions: Alignment check, TLB fill, Non-leaf PTE not present, Leaf PTE not present, Write protection, Page modified, Access bit, TLB VLPT miss. Privilege level: All Usage Note:

A very long instruction sequence between LLW and SCW may always fail the SCW instruction due to periodic timer interrupt. Software should take this into consideration when constructing LLW and SCW instruction sequences.

**Implementation Note:** Since SCW instruction is defined to conditionally execute in the memory system, depending on the cache coherence design in the memory system, an implementation may need to prevent SCW from being invalidated by any interruption during the period after the SCW request has entered the memory system until the success/fail status has returned from the memory system to complete the SCW instruction.

For an implementation with a non-coherent cache system, an internal and an external flags may be needed to implement the "lock" state. For such a system, the success or fail of SCW instruction is determined by either both flags or internal flag alone. The conditions based on different memory attributes are summarized as follows.



|               | Non-cacheable  | Cach          | eable         |
|---------------|----------------|---------------|---------------|
|               |                | Write-back    | Write-through |
| Success/fail  | Internal and   | Internal flag | Internal flag |
| determined by | external flags |               |               |

Notice that if the memory location that SCW operates on is cacheable, only the internal flag is used and the external flag will be ignored.

### Additional Software Constraints:

For N1213 hardcore N1213\_43U1HA0 (CPU\_VER==0x0C010003), additional software constraints must be followed to ensure correct LLW/SCW operations.

- If LLW/SCW are used in an interruption handler, it must be followed that
  - Execution of a LLW instruction must lead to execution of a SCW instruction. UPREDICTABLE result may happen if execution of a LLW instruction eventually leads to execution of an IRET instruction without going through a SCW instruction.



## **SEB (Sign Extend Byte)**

Type: 32-Bit Baseline Format:

| 31 | 30  | 25     | 24 | 20 | 19 | 15 | 14   | 5      | 4   | 0   |
|----|-----|--------|----|----|----|----|------|--------|-----|-----|
| 0  | ALU | J_1    | F  | Rt | R  | la | 0000 | 000000 | SI  | EB  |
|    | 100 | 100000 |    |    |    |    |      |        | 100 | 000 |

Syntax: SEB Rt, Ra

**Purpose:** Sign-extend the least-significant-byte of a register.

**Description:** The least-significant-byte of Ra is sign-extended. And the result is written to Rt.

**Operations:** 

Rt = SE(Ra(7, 0));

Exceptions: None

Privilege level: All

Note:



## SEH (Sign Extend Halfword)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30    | 25  | 24       | 20 | 19 | 15 | 14        | 5 | 4   | 0   |
|----|-------|-----|----------|----|----|----|-----------|---|-----|-----|
| 0  | ALU_1 |     | ALU_1 Rt |    | Ra |    | 000000000 |   | SE  | ΕH  |
|    | 100   | 000 |          |    |    |    |           |   | 100 | 001 |

Syntax: SEH Rt, Ra

**Purpose:** Sign-extend the least-significant-halfword of a register.

**Description:** The least-significant-halfword of Ra is sign-extended. And the result is written to Rt.

### **Operations:**

Rt = SE(Ra(15, 0));

Exceptions: None Privilege level: All

Note:



## **SETEND (Set data endian)**

#### Format:

| 31 | 30 25  | 24 21 | 20 | 19 10     | 9 5    | 4 0   |
|----|--------|-------|----|-----------|--------|-------|
| 0  | MISC   | 0000  | BE | PSW_IDX   | SETEND | MTSR  |
|    | 110010 |       |    | 001000000 | 00001  | 00011 |

Syntax: SETEND.B (Set data endian to big endian)

SETEND.L (Set data endian to little endian)

Purpose: It is used to control the data endian mode in the PSW register.

#### **Description:**

This instruction has two flavors. The SETEND.B will set the data endian mode to big-endian while the SETEND.L will set the data endian mode to little-endian. Note that this instruction can be used in user mode. The BE bit in this instruction encoding distinguishes these two flavors.

| BE | Flavor   |
|----|----------|
| 0  | SETEND.L |
| 1  | SETEND.B |

#### **Operations:**

SR[PSW].BE = 1; // SETEND.B SR[PSW].BE = 0; // SETEND.L

### Exceptions: None

#### Privilege level: All

#### Note:

• A DSB instruction must follow a SETEND instruction to guarantee that any subsequent load/store instruction can observe the just updated PSW.BE value.



## **SETGIE (Set global interrupt enable)**

#### Format:

| 31 | 30 25  | 24 21 | 20 | 19 10     | 9 5    | 4 0   |
|----|--------|-------|----|-----------|--------|-------|
| 0  | MISC   | 0000  | EN | PSW_IDX   | SETGIE | MTSR  |
|    | 110010 |       |    | 001000000 | 00010  | 00011 |

Syntax: SETGIE.E (Enable global interrupt)

SETGIE.D (Disable global interrupt)

Purpose: It is used to control the global interrupt enable bit in the PSW register.

#### **Description:**

This instruction has two flavors. The SETGIE.E will enable the global interrupt while the SETGIE.D will disable the global interrupt. The EN bit in this instruction encoding distinguishes these two flavors.

| EN | Flavor   |
|----|----------|
| 0  | SETGIE.D |
| 1  | SETGIE.E |

#### **Operations:**

SR[PSW].GIE = 1; // SETGIE.E
SR[PSW].GIE = 0; // SETGIE.D

#### Exceptions: None

### Privilege level: Superuser and above

#### Note:

• A DSB instruction must follow a SETGIE instruction to guarantee that any subsequent instruction can observe the just updated PSW.GIE value for the external interrupt interruption behavior.



## **SETHI (Set High Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 0   |
|----|--------|-------|--------|
| 0  | SETHI  | Rt    | imm20u |
|    | 100011 |       |        |

Syntax: SETHI Rt, imm20u

**Purpose:** To initialize the high portion of a register with a constant.

**Description:** Move the imm20u into the upper 20-bits of general register Rt. The lower 12-bits of Rt will be filled with 0.

### **Operations:**

Rt = imm20u << 12;

00001101

# SH (Store Halfword)

**Type:** 32-Bit Baseline **Format:** 

011100

|       |        | SH |    |      |    |    |    |    |    |      |      |
|-------|--------|----|----|------|----|----|----|----|----|------|------|
| 31    | 30     | 25 | 24 | 20   | 19 | 15 | 14 | 10 | 98 | 7    | 0    |
| 0     | MEM    |    | R  | Lt . | Ra |    | Rb |    | SV | SH   |      |
|       | 011100 |    |    |      |    |    |    |    |    | 0000 | 1001 |
|       |        |    |    |      |    |    |    |    |    |      |      |
| SH.bi |        |    |    |      |    |    |    |    |    |      |      |
| 31    | 30     | 25 | 24 | 20   | 19 | 15 | 14 | 10 | 98 | 7    | 0    |
| 0     | MEM    |    | R  | Lt . | R  | la | R  | b  | SV | SH   | I.bi |

| Syntax: | SH    | Rt, [Ra + (Rb << sv)] |  |  |
|---------|-------|-----------------------|--|--|
|         | SH.bi | Rt, [Ra], (Rb << sv)  |  |  |

Purpose: To store a 16-bit halfword from a general register into memory.

**Description:** The least-significant 16-bit halfword in the general register Rt is stored to the memory. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + (Rb  $\ll$  sv) value after the memory store operation.

The memory address has to be halfword-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
if (!Halfword_Aligned(Vaddr)) {
    Generate Exception(Data alignment check);
```



```
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, HALFWORD, Attributes, Rt(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## SHI (Store Halfword Immediate)

**Type:** 32-Bit Baseline **Format:** 

|    |        | SHI  |    |    |    |    |    |        |  |
|----|--------|------|----|----|----|----|----|--------|--|
| 31 | 30     | 25   | 24 | 20 | 19 | 15 | 14 | 0      |  |
| 0  | SHI    |      | Rt |    | Ra |    |    | imm15s |  |
|    | 001001 |      |    |    |    |    |    |        |  |
|    |        | OT U | 1. |    |    |    |    |        |  |
|    | SHI.bi |      |    |    |    |    |    |        |  |

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 0      |
|----|--------|----|----|----|----|----|----|--------|
| 0  | SHI.bi |    | Rt |    | Ra |    |    | imm15s |
|    | 0011   | 01 |    |    |    |    |    |        |

| Syntax: | SHI    | Rt, [Ra + (imm15s << 1)] |  |  |
|---------|--------|--------------------------|--|--|
|         | SHI.bi | Rt, [Ra], (imm15s << 1)  |  |  |

(imm15s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To store a 16-bit halfword from a general register into memory.

**Description:** The least-significant 16-bit halfword in the general register Rt is stored to the memory. Two different forms are used to specify the memory address. The regular form uses  $Ra + SE(imm15s \ll 1)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the  $Ra + SE(imm15s \ll 1)$  value after the memory store operation. Note that imm15s is treated as a signed integer. The memory address has to be half-word-aligned. Otherwise, a Data Alignment Check exception will be generated.

### **Operations:**

```
Addr = Ra + Sign_Extend(imm15s << 1);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
if (!Halfword_Aligned(Vaddr)) {
```


```
Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, WORD, Attributes, Rt(15,0));
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## **SLL (Shift Left Logical)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25    | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|--------|-------|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | AL     | ALU_1 |    | Rt | Ra |    | R  | b  | 000 | 000 | SLL |     |
|    | 100000 |       |    |    |    |    |    |    |     |     | 011 | 100 |

Syntax: SLL Rt, Ra, Rb

Purpose: Perform logical left shift operation on the content of a register.

**Description:** The content of Ra is left-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 5-bits of the Rb register. And the result is written to Rt.

#### **Operations:**

sa = Rb(4,0); Rt = CONCAT(Ra(31-sa,0), sa`b0);

Exceptions: None

Privilege level: All

Note:



## SLLI (Shift Left Logical Immediate)

Type: 32-Bit Baseline Format:

| 31 | 30       | 25 | 24 | 20 | 19 | 15  | 14  | 10 | 9   | 5  | 4   | 0   |
|----|----------|----|----|----|----|-----|-----|----|-----|----|-----|-----|
| 0  | ALU_1 Rt |    | Rt | R  | la | imr | n5u | 00 | 000 | SL | LI  |     |
|    | 100000   |    |    |    |    |     |     |    |     |    | 010 | 000 |

Syntax: SLLI Rt, Ra, imm5u

Purpose: Perform logical left shift operation on the content of a register.

**Description:** The content of Ra is left-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm5u constant. And the result is written to Rt.

**Operations:** 

Rt = CONCAT(Ra(31-imm5u,0), imm5u`b0);

Exceptions: None Privilege level: All Note:



## SLT (Set on Less Than)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25    | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4  | 0   |
|----|--------|-------|----|----|----|----|----|----|-----|-----|----|-----|
| 0  | AL     | ALU_1 |    | Rt | R  | la | R  | b  | 000 | 000 | SI | LT  |
|    | 100000 |       |    |    |    |    |    |    |     |     | 00 | 110 |

Syntax: SLT Rt, Ra, Rb

Purpose: Unsigned comparison between the contents of two registers.

**Description:** If the content of Ra is less than (unsigned comparison) the content of Rb, a result of 1 is written to Rt; otherwise, a result of 0 is written to Rt.

### **Operations:**

```
if (CONCAT(1`b0, Ra) < CONCAT(1`b0, Rb)) {
    Rt = 1;
    else {
        Rt = 0;
     }
Exceptions: None
</pre>
```



## **SLTI (Set on Less Than Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30    | 25 | 24 | 20 | 19 | 15 | 14 |        | 0 |
|----|-------|----|----|----|----|----|----|--------|---|
| 0  | SLT   | [  | R  | lt | R  | Ra |    | imm15s |   |
|    | 10111 | 0  |    |    |    |    |    |        |   |

Syntax: SLTI Rt, Ra, imm15s

**Purpose:** Unsigned comparison between the content of a register and a signed constant. **Description:** The content of Ra is unsigned-compared with a sign-extended imm15s. If the content of Ra is less than the sign-extended imm15s, the result of 1 will be written to Rt; otherwise, the result of 0 will be written to Rt. The sign-extended imm15s will generate an unsigned constant in the following range:

$$[2^{32}-1, 2^{32}-2^{14}]$$
 and  $[2^{14}-1, 0]$ 

**Operations:** 

Rt = (Ra (unsigned) < SE(imm15s)) ? 1 : 0;</pre>

Exceptions: None Privilege level: All Note:



## **SLTS (Set on Less Than Signed)**

Type: 32-Bit Baseline Format:

|   | 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4  | 0   |
|---|----|-----|-----|----|----|----|----|----|----|-----|-----|----|-----|
|   | 0  | AL  | U_1 | F  | Rt | R  | la | R  | b  | 000 | 000 | SL | TS  |
| ļ |    | 100 | 000 |    |    |    |    |    |    |     |     | 00 | 111 |

Syntax: SLTS Rt, Ra, Rb

Purpose: Signed comparison between the contents of two registers.

**Description:** If the content of Ra is less than (signed comparison) the content of Rb, a result of 1 is written to Rt; otherwise, a result of 0 is written to Rt.

## **Operations:**

```
if (Ra < Rb) {
    Rt = 1;
    else {
        Rt = 0;
     }
Exceptions: None</pre>
```



## **SLTSI (Set on Less Than Signed Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30 25  | 24 20 | 19 15 | 14 0   |
|----|--------|-------|-------|--------|
| 0  | SLTSI  | Rt    | Ra    | imm15s |
|    | 101111 |       |       |        |

Syntax: SLTSI Rt, Ra, imm15s

**Purpose:** Signed comparison between the content of a register and a constant.

**Description:** The content of Ra is signed-compared with the sign-extended imm15s. If the content of Ra is less than the sign-extended imm15s, the result of 1 will be written to Rt; otherwise, the result of 0 will be written to Rt. The sign-extended imm15s will generate a signed constant in the following range:

```
[2<sup>14</sup>-1, 0] and [-1, -2<sup>14</sup>]
```

**Operations:** 

Rt = (Ra (signed) < SE(imm15s)) ? 1 : 0;</pre>

Exceptions: None Privilege level: All Note:



## **SMW (Store Multiple Word)**

### Format:

| 31 | 30 2  | 25 | 24 | 20 | 19 | 15 | 14 10 | 9   | 6    | 5   | 4   | 3   | 2 | 10 |
|----|-------|----|----|----|----|----|-------|-----|------|-----|-----|-----|---|----|
| 0  | LSMV  | V  | R  | b  | R  | a  | Re    | Ena | ble4 | SMW | b:0 | i:0 | m | 00 |
|    | 01110 | 1  |    |    |    |    |       |     |      | 1   | a:1 | d:1 |   |    |

**Syntax:** SMW. {b | a} {i | d} {m?} Rb, [Ra], Re, Enable4

**Purpose:** Store multiple 32-bit words from multiple registers into sequential memory locations.

Description: Store multiple 32-bit words from a range or a subset of source general-purpose registers to sequential memory addresses specified by the base address register Ra and the {b | a} {i | d} options. The source registers are specified by a registers list formed by Rb, Re, and the four-bit Enable4 field as follows. <Registers List> = a range from {Rb, Re} and a list from <Enable4>

- {i | d} option specifies the direction of the address change. {i} generates increasing addresses from Ra and {d} generates decreasing addresses from Ra.
- {b | a} option specifies the way how the first address is generated. {b} use the contents of Ra as the first memory store address. {a} use either Ra+4 or Ra-4 for the {i | d} option respectively as the first memory store address.
- {m?} option, if it is specified, indicates that the base address register will be updated to the value computed in the following formula at the completion of this instruction.

TNReg = Total number of registers stored Updated value = Ra + (4 \* TNReg) for {i} option Updated value = Ra - (4 \* TNReg) for {d} option

• [Rb, Re] specifies a range of registers whose contents will be stored by this instruction. Rb(4,0) specifies the first register number in the continuous register range and Re(4,0) specifies the last register number in this register range. In addition to the range of registers, <Enable4(3,0)> specifies the store of 4 individual registers from R28 to R31 (s9/fp, gp, lp, sp) which have special calling convention usage. The exact mapping of Enable4(3,0) bits and registers is as follows:



| Bits      | Enable4(3) | Enable4(2) | Enable4(1) | Enable4(0) |
|-----------|------------|------------|------------|------------|
|           | Format(9)  | Format(8)  | Format(7)  | Format(6)  |
| Registers | R28        | R29        | R30        | R31        |

•

- Several constraints are imposed for the <Registers List>:
  - If [Rb, Re] specifies at least one register:
    - $Rb(4,0) \le Re(4,0) AND$
    - $0 \le Rb(4,0), Re(4,0) \le 28$
  - If [Rb, Re] specifies no register at all:
    - Rb(4,0) == Re(4,0) = 0b11111 AND
    - Enable4(3,0) != 0b0000
  - If these constraints are not met, UNPREDICTABLE result will happen to the contents of the memory range pointed to by the base register and the base register itself if the {m?} option is specified after this instruction.
- The register is stored in sequence to matching memory locations. That is, the lowest-numbered register is stored to the lowest memory address while the highest-numbered register is stored to the highest memory address.
- If the base address register Ra is specified in the <Registers Specification>, the value stored to the memory from the register Ra is the Ra value before this instruction is executed.
- This instruction can handle aligned/unaligned memory address.

#### **Operation:**

```
TNReg = Count_Registers(register_list);
if ("bi") {
    B_addr = Ra;
    E_addr = Ra + (TNReg * 4) - 4;
} elseif ("ai") {
    B_addr = Ra + 4;
    E_addr = Ra + (TNReg * 4);
} elseif ("bd") {
    B_addr = Ra - (TNReg * 4) + 4;
    E_addr = Ra;
} else { // "ad"
    B_addr = Ra - (TNReg * 4);
```



```
E addr = Ra - 4
}
VA = B addr;
for (i = 0 to 31) {
   if (register list[i] == 1) {
      (PA, Attributes) = Address Translation(VA, PSW.DT);
      Excep status = Page Exception(Attributes, UserMode, STORE);
      If (Excep status == NO EXCEPTION) {
         Store Memory(PA, Word, Attributes, Ri);
         VA = VA + 4;
      } else {
         Generate Exception (Excep status);
      }
   }
}
if ("im") {
   Ra = Ra + (TNReg * 4);
} else { // "dm"
  Ra = Ra - (TNReg * 4);
}
```

**Exception:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error.

• The base register value is left unchanged on an exception event, no matter whether the base register update is specified or not.

**Interruption:** Whether this instruction is interruptible or not is implementation-dependent.

### Privilege Level: all

### Note:

- (1) LMW and SMW instructions do not guarantee atomicity among individual memory access operations. And they do not guarantee single access to a memory location during the execution either. Any I/O access that has side-effects other than simple stable memory-like access behavior should not use these two instructions.
- (2) The memory access order among the words accessed by LMW/SMW is not defined



here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:

- For LMW/SMW.i : increasing memory addresses from base address.
- For LMW/SMW.d: decreasing memory addresses from base address.
- (3) The memory access order within an un-aligned word accessed is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i: the aligned low address of the word and then the aligned high address of the word. If an interruption occurs, the EVA register will contain the starting low address of the un-aligned word or .
  - For LMW/SMW.d: the aligned high address of the word and then the aligned low address of the word. If an interruption occurs, the EVA register will contain "base un-aligned address + 4" of the first word or the starting low address of the remaining decreasing memory word.
- (4) Based on the more likely access order of (2) and (3), upon interruption, the EVA register for un-aligned LMW/SMW will more likely have the following value:
  - For LMW/SMW.i: the starting low addresses of the accessed words or "Ra + (TNReg \* 4)" where TNReg represents the total number of registers loaded or stored.
  - For LMW/SMW.d: the starting low addresses of the accessed words or "Ra + 4".



## **SRA (Shift Right Arithmetic)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25    | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|--------|-------|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | AL     | ALU_1 |    | Rt | R  | a  | R  | b  | 000 | 000 | SRA |     |
|    | 100000 |       |    |    |    |    |    |    |     |     | 011 | 110 |

Syntax: SRA Rt, Ra, Rb

Purpose: Perform arithmetic right shift operation on the content of a register.

**Description:** The content of Ra is right-shifted arithmetically, that is, the shifted out bits are filled with the sign-bit Ra(31). The shift amount is specified by the low-order 5-bits of the Rb register. And the result is written to Rt.

### **Operations:**

sa = Rb(4, 0);

Rt = CONCAT(sa`bRa(31), Ra(31, sa));

Exceptions: None

Privilege level: All

Note:



## **SRAI (Shift Right Arithmetic Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30     | 25       | 24 | 20 | 19 | 15 | 14  | 10  | 9  | 5   | 4    | 0   |
|----|--------|----------|----|----|----|----|-----|-----|----|-----|------|-----|
| 0  | AL     | ALU_1 Rt |    | Rt | R  | la | imr | n5u | 00 | 000 | SRAI |     |
|    | 100000 |          |    |    |    |    |     |     |    |     | 010  | 010 |

Syntax: SRAI Rt, Ra, imm5u

Purpose: Perform arithmetic right shift operation on the content of a register.

**Description:** The content of Ra is right-shifted arithmetically, that is, the shifted out bits are filled with sign-bit Ra(31). The shift amount is specified by the imm5u constant. And the result is written to Rt.

#### **Operations:**

```
Rt = CONCAT(imm5u`bRa(31), Ra(31,imm5u));
```

Exceptions: None Privilege level: All Note:



## **SRL (Shift Right Logical)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30     | 25    | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|--------|-------|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | AL     | ALU_1 |    | Rt | R  | a  | R  | b  | 000 | 000 | SRL |     |
|    | 100000 |       |    |    |    |    |    |    |     |     | 01  | 101 |

Syntax: SRL Rt, Ra, Rb

Purpose: Perform logical right shift operation on the content of a register.

**Description:** The content of Ra is right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the low-order 5-bits of the Rb register. And the result is written to Rt.

### **Operations:**

sa = Rb(4,0); Rt = CONCAT(sa`b0, Ra(31,sa));

Exceptions: None

Privilege level: All

Note:



## SRLI (Shift Right Logical Immediate)

Type: 32-Bit Baseline Format:

| 31 | 30     | 25       | 24 | 20 | 19 | 15 | 14  | 10  | 9  | 5   | 4   | 0   |
|----|--------|----------|----|----|----|----|-----|-----|----|-----|-----|-----|
| 0  | ALU    | ALU_1 Rt |    | Rt | R  | la | imr | n5u | 00 | 000 | SR  | RLI |
|    | 100000 |          |    |    |    |    |     |     |    |     | 010 | 001 |

Syntax: SRLI Rt, Ra, imm5u

Purpose: Perform logical right shift operation on the content of a register.

**Description:** The content of Ra is right-shifted logically, that is, the shifted out bits are filled with zero. The shift amount is specified by the imm5u constant. And the result is written to Rt.

### **Operations:**

Rt = CONCAT(imm5u`b0, Ra(31,imm5u));

Exceptions: None

#### Privilege level: All

**Note:** "SRLI R0, R0, 0" will be aliased to NOP and treated by an implementation as NOP.



## **STANDBY (Wait For External Event)**

#### Format:

| 31 | 30   | 25 | 24  | 20  | 19 |            | 10 | 9  | 7 | 6   | 5    | 4   | 0    |
|----|------|----|-----|-----|----|------------|----|----|---|-----|------|-----|------|
| 0  | MIS  | С  | 000 | 000 |    | 0000000000 |    | 00 | 0 | Sub | Туре | STA | NDBY |
|    | 1100 | 10 |     |     |    |            |    |    |   |     |      | 0   | 0000 |

\* N12 implementation for this instruction, please refer to chapter 9.2

**Syntax:** STANDBY SubType (= no\_wake\_grant, wake\_grant)

**Purpose:** It is used for a core to enter a standby state while waiting for external events to happen.

**Description:** This instruction puts the core and its associating structures into an implementation-dependent low power standby mode where the instruction execution stops and most of the pipeline clocks can be disabled. The core has to enter the standby mode after all external memory and I/O accesses have been completed.

In general, the core leaves the standby mode when an external event happens that needs the core's attention. However, to facilitate the need for an external power manager to control the clock frequency and voltage, the wakeup action may need the external power manager's consent. Thus two flavors of STANDBY instruction are defined to distinguish the different usages. The SubType field definitions are listed as follows.

| SubType | Mnemonic      | Wakeup Condition                                             |
|---------|---------------|--------------------------------------------------------------|
| 0       | no_wake_grant | The STANDBY instruction immediately monitors and             |
|         |               | accepts a wakeup event (e.g external interrupt) to leave the |
|         |               | standby mode without waiting for a wakeup_consent            |
|         |               | notification from an external agent.                         |
| 1       | wake_grant    | The STANDBY instruction waits for a wakeup_consent           |
|         |               | notification from an external agent (e.g. power              |
|         |               | management unit) before monitoring and accepting a           |
|         |               | wakeup_event (e.g. external interrupt) to leave the standby  |
|         |               | mode.                                                        |
| 2       | wait_done     | The STANDBY instruction waits for a wakeup_consent           |

 Table 59
 STANDBY instruction SubType definitions



|  | notification from an external agent (e.g. power        |
|--|--------------------------------------------------------|
|  | management unit). When the wakeup_consent notification |
|  | arrives, the core leaves the standby mode immediately. |

The wakeup external events include interrupt (regardless of masking condition), debug request, wakeup signal, reset etc. And the instruction execution restarts either from the instruction following the STANDBY instruction or from the enabled interrupt handler which cause the core to leave the standby mode. When entering an interrupt handler, the IPC system register will have the address of the instruction following the STANDBY instruction.

An implementation can export the standby state to an external agent such as a power/energy controller to further regulate the clock or the voltage of the processor core for maximum energy savings. However, if any such clock or voltage regulation causes any core/memory state loss, software is responsible to preserve the needed states before the core enters standby mode. And if such state loss has happened, the only sensible way to bring the core into action is through a reset event.

### **Operations:**

Enter\_Standby();

#### Exceptions: None

**Privilege level:** The behaviors of STANDBY under different processor operating modes are listed in the following table.

| Privilege Level | SubType encoding | SubType behavior |
|-----------------|------------------|------------------|
| User            | 0                | 0                |
|                 | 1                | 0                |
|                 | 2                | 0                |
| Superuser       | 0                | 0                |
|                 | 1                | 1                |
|                 | 2                | 2                |

Note:



# SUB (Subtraction)

Type: 32-Bit Baseline Format:

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4  | 0   |
|----|-----|-----|----|----|----|----|----|----|-----|-----|----|-----|
| 0  | AL  | U_1 | F  | Rt | R  | la | R  | b  | 000 | 000 | SU | JB  |
|    | 100 | 000 |    |    |    |    |    |    |     |     | 00 | 001 |

Syntax: SUB Rt, Ra, Rb

**Purpose:** Subtract the content of two registers.

**Description:** The content of Rb is subtracted from the content of Ra. And the result is written to Rt.

### **Operations:**

Rt = Ra - Rb;

Exceptions: None Privilege level: All Note:



## **SUBRI (Subtract Reverse Immediate)**

Type: 32-Bit Baseline Format:

| 31 | 30   | 25  | 24 | 20 | 19 | 15 | 14 |        | 0 |
|----|------|-----|----|----|----|----|----|--------|---|
| 0  | SUE  | BRI | F  | Rt | R  | la |    | imm15s |   |
|    | 1010 | 001 |    |    |    |    |    |        |   |

Syntax: SUBRI Rt, Ra, imm15s

Purpose: Subtract the content of a register from a signed constant.

**Description:** The content of Ra is subtracted from the sign-extended imm15s. And the result is written to Rt.

#### **Operations:**

Rt = SE(imm15s) - Ra;

Exceptions: None Privilege level: All Note:



## SVA (Set on Overflow Add)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|-----|-----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU | U_1 | F  | Rt | R  | la | R  | lb | 000 | 000 | SV  | VA  |
|    | 100 | 000 |    |    |    |    |    |    |     |     | 11( | 000 |

Syntax: SVA Rt, Ra, Rb

Purpose: Generate overflow status on adding the contents of two registers.

**Description:** If adding the contents of Ra and Rb results in 32-bit 2's complement arithmetic overflow condition, a result of 1 is written to Rt; otherwise, a result of 0 is written to Rt.

### **Operations:**

```
value = CONCAT(Ra(31), Ra(31,0)) + CONCAT(Rb(31), Rb(31,0));
if (value(32) != value(31)) {
    Rt = 1;
    else {
        Rt = 0;
        }
Exceptions: None
```



## SVS (Set on Overflow Subtract)

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|-----|-----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU | J_1 | F  | Rt | R  | la | R  | b  | 000 | 000 | SV  | VS  |
|    | 100 | 000 |    |    |    |    |    |    |     |     | 110 | 001 |

### Syntax: SVS Rt, Ra, Rb

**Purpose:** Generate overflow status on subtracting the contents of two registers. **Description:** If subtracting the contents of Ra and Rb results in 32-bit 2's complement arithmetic overflow condition, a result of 1 is written to Rt; otherwise, a result of 0 is written to Rt.

### **Operations:**

```
value = CONCAT(Ra(31), Ra(31,0)) - CONCAT(Rb(31), Rb(31,0));
if (value(32) != value(31)) {
    Rt = 1;
    else {
        Rt = 0;
      }
Exceptions: None
```

00001110

# SW (Store Word)

**Type:** 32-Bit Baseline **Format:** 

011100

|    |     | SW  |     |    |    |    |    |    |    |      |      |
|----|-----|-----|-----|----|----|----|----|----|----|------|------|
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 | 10 | 98 | 7    | 0    |
| 0  | Mł  | EM  | R   | lt | R  | a  | R  | b  | SV | S    | W    |
|    | 011 | 100 |     |    |    |    |    |    |    | 0000 | 1010 |
|    |     |     |     |    |    |    |    |    |    |      |      |
|    |     | SW  | .bi |    |    |    |    |    |    |      |      |
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 | 10 | 98 | 7    | 0    |
| 0  | MI  | EM  | R   | Lt | R  | a  | R  | b  | sv | SW   | V.bi |

| Syntax: | SW    | Rt, [Ra + (Rb << sv)] |
|---------|-------|-----------------------|
|         | SW.bi | Rt, [Ra], (Rb << sv)  |

Purpose: To store a 32-bit word from a general register into memory.

**Description:** This instruction stores a word from the the general register Rt into the memory. Two different forms are used to specify the memory address. The regular form uses  $Ra + (Rb \ll sv)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra + (Rb  $\ll$  sv) value after the memory store operation.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

### **Operations:**

```
Addr = Ra + (Rb << sv);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
if (!Word_Aligned(Vaddr)) {
    Generate Exception(Data alignment check);
```



```
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, WORD, Attributes, Rt);
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## SWI (Store Word Immediate)

**Type:** 32-Bit Baseline **Format:** 

|    |     | SWI | -   |    |    |    |    |        |   |
|----|-----|-----|-----|----|----|----|----|--------|---|
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 |        | 0 |
| 0  | SV  | WI  | F   | Rt | R  | la |    | imm15s |   |
|    | 001 | 010 |     |    |    |    |    |        |   |
|    |     |     |     |    |    |    |    |        |   |
|    |     | SWI | .bi |    |    |    |    |        |   |
| 31 | 30  | 25  | 24  | 20 | 19 | 15 | 14 |        | 0 |
|    |     |     |     |    |    |    |    |        |   |

| 0 | SWI.bi | Rt | Ra | imm15s |
|---|--------|----|----|--------|
|   | 001110 |    |    |        |
|   |        |    |    |        |

| Syntax: | SWI    | Rt, [Ra + (imm15s << 2)] |
|---------|--------|--------------------------|
|         | SWI.bi | Rt, [Ra], (imm15s << 2)  |

(imm15s is a word offset. In assembly programming, always write a byte offset.)

Purpose: To store a 32-bit word from a general register into memory.

**Description:** This instruction stores a word from the general register Rt into the memory. Two different forms are used to specify the memory address. The regular form uses Ra +  $SE(imm15s \ll 2)$  as its memory address while the .bi form uses Ra. For the .bi form, the Ra register will be updated with the Ra +  $SE(imm15s \ll 2)$  value after the memory store operation. Note that imm15s is treated as a signed integer.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

### **Operations:**

```
Addr = Ra + Sign_Extend(imm15s << 2);
If (.bi form) {
    Vaddr = Ra;
} else {
    Vaddr = Addr;
}
if (!Word Aligned(Vaddr)) {
```



```
Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, WORD, Attributes, Rt);
   If (.bi form) { Ra = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



## **SWUP (Store Word with User Privilege Translation)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |
|----|--------|-------|-------|-------|----|----------|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | SWUP     |
|    | 011100 |       |       |       |    | 00101010 |

**Syntax:** SWUP Rt,  $[Ra + (Rb \le sv)]$ 

**Purpose:** To store a 32-bit word from a general register into memory with the user mode privilege address translation.

**Description:** This instruction stores a word from the general register Rt into the memory address Ra + (Rb << sv) with the user mode privilege address translation regardless of the current processor operation mode (i.e. PSW.POM) and the current data address translation state (i.e. PSW.DT). The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

### **Operations:**

```
Vaddr = Ra + (Rb << sv);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, TRANSLATE);
Excep_status = Page_Exception(Attributes, UserMode, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store_Memory(PAddr, WORD, Attributes, Rt);
} else {
    Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:





## SYSCALL (System Call)

### Format:

| 31 | 30 25  | 24 20 | 19   | 5 | 4 0     |  |
|----|--------|-------|------|---|---------|--|
| 0  | MISC   | 00000 | SWID |   | SYSCALL |  |
|    | 110010 |       |      |   | 01011   |  |

### Syntax: SYSCALL SWID

Purpose: It is used to generate a System Call exception.

### **Description:**

SYSCALL instruction will unconditionally generate a System Call exception and transfer control to the System Call exception handler. The 15-bits SWID is used by software as a parameter to distinguish different system call services.

#### **Operations:**

Generate\_Exception(System\_Call);

Exceptions: System Call



# **TEQZ (Trap if equal 0)**

## Format:

| 31 | 30 25  | 24 20 | 19   | 5 | 4     | 0 |
|----|--------|-------|------|---|-------|---|
| 0  | MISC   | Ra    | SWID |   | TEQZ  | Z |
|    | 110010 |       |      |   | 00110 | 0 |

## **Syntax:** TEQZ Ra, SWID

**Purpose:** It is used to generate a conditional Trap exception.

### **Description:**

TEQZ instruction will generate a Trap exception and transfer control to the Trap exception handler if the content of Ra is equal to 0. The 15-bits SWID is used by software as a parameter to distinguish different trap features and usages.

### **Operations:**

If (GR[Ra] == 0)
Generate\_Exception(Trap);

### **Exceptions:** Trap



# TNEZ (Trap if not equal 0)

### Format:

| 31 | 30 25  | 24 20 | 19 5 | 4 0   |
|----|--------|-------|------|-------|
| 0  | MISC   | Ra    | SWID | TNEZ  |
|    | 110010 |       |      | 00111 |

## Syntax: TNEZ Ra, SWID

**Purpose:** It is used to generate a conditional Trap exception.

## **Description:**

TNEZ instruction will generate a Trap exception and transfer control to the Trap exception handler if the content of Ra is not equal to 0. The 15-bits SWID is used by software as a parameter to distinguish different trap features and usages.

### **Operations:**

```
If (GR[Ra] != 0)
Generate_Exception(Trap);
```

### **Exceptions:** Trap

# **TLBOP (TLB Operation)**

#### Format:

| 31 | 30  | 25     | 24 | 20 | 19 | 15 | 14  | 10  | 9   | 5    | 4   | 0   |
|----|-----|--------|----|----|----|----|-----|-----|-----|------|-----|-----|
| 0  | MI  | SC     | F  | Rt | R  | la | 000 | 000 | Sub | Туре | TLI | BOP |
|    | 110 | 110010 |    |    |    |    |     |     | 01  | 110  |     |     |

Syntax: TLBOPRa, SubTypeTLBOPRt, Ra, PB (TLB probe operation)TLBOPFLUA (TLB flush all operation)

**Purpose:** Perform various operations on processor TLB. This instruction is typically used by software to manage page table entry (PTE) information in TLB.

#### **Description:**

This instruction is used to perform TLB control operations based on the SubType field. The definition and encoding for the SubType field is listed in the following table. Depending on the SubType, different TLBOP instructions have different number of operands from 0 to 2.

| SubType | Mnemonics Operation |                                           | Rt?/Ra? |
|---------|---------------------|-------------------------------------------|---------|
| 0       | TargetRead (TRD)    | Read targeted TLB entry                   | -/Ra    |
| 1       | TargetWrite (TWR)   | TargetWrite (TWR)Write targeted TLB entry |         |
| 2       | RWrite (RWR)        | Write a hardware-determined TLB entry     | -/Ra    |
| 3       | RWriteLock (RWLK)   | Write a hardware-determined TLB entry     | -/Ra    |
|         |                     | and lock                                  |         |
| 4       | Unlock (UNLK)       | Unlock a TLB entry                        | -/Ra    |
| 5       | Probe (PB)          | Probe TLB entry information               | Rt/Ra   |
| 6       | Invalidate (INV)    | Invalidate TLB entries except locked      | -/Ra    |
|         |                     | entries                                   |         |
| 7       | FlushAll (FLUA)     | Flush all TLB entries except locked       | _/_     |
|         |                     | entries                                   |         |

Table 60TLBOP SubType Definitions

The operations of the cache control instruction can be grouped and described in the



following categories:

### a. TLB Target Read

Syntax: TLBOP Ra, TargetRead

This operation reads a specified entry in the software-visible portion of the TLB structure. The specified entry is indicated by the Ra register. The read result is placed in the TLB\_VPN, TLB\_DATA, and TLB\_MISC registers.

The TLB entry number for a non-fully-associative N sets K ways TLB cache is as follows:

| 31      | log2(N*K) | Log2(N*K)-1 | log2(N) | Log2(N)-1  | 0 |
|---------|-----------|-------------|---------|------------|---|
| Ignored |           | Way num     | nber    | Set number |   |

The normal instruction sequence of performing the TLB Target Read operation is as follows:

```
movi Ra, TLB_rd_entry // prepare read entry number
tlbop Ra, TRD // read TLB
dsb // data serialization barrier
mfsr Ry, TLB_VPN // move read result to reg
```

**Important:** Since the TLB\_MISC register contains the current process's Context ID and Access Page Size information, any use of this instruction is required to save/restore the TLB\_MISC register if you want the current process to run correctly immediately after this operation.

## b. TLB Target Write

#### Syntax: TLBOP Ra, TargetWrite

This operation writes a specified entry in the software-visible portion of the TLB structure. The specified entry is indicated by the Ra register. The other write operands are in the TLB\_VPN, TLB\_DATA, and TLB\_MISC registers. The TLB entry number for the non-fully-associative N sets K ways TLB cache is as follows:

| 31      | log2(N*K) | Log2(N*K)-1 | log2(N) | Log2(N)-1  | 0 |
|---------|-----------|-------------|---------|------------|---|
| Ignored |           | Way num     | nber    | Set number |   |



If the selected target entry is locked, this instruction will overwrite the locked entry and clear the locked flag.

The normal instruction sequence of performing the TLB Target Write operation is as follows:

```
// may not needed
mtsr
       Ra, TLB VPN
                      // prepare PPN, etc.
       Rb, TLB DATA
mtsr
mtsr
       Rc, TLB MISC
                      // may not needed
                          // data serialization barrier
dsb
                          // may not needed (imp-dep)
                        // prepare write index
movi
       Rd, TLB wr entry
       Rd, TWR
                          // idx TLB write
tlbop
                          // inst serialization barrier
isb
```

## c. TLB Random Write (HW-determined way in a set)

Syntax: TLBOP Ra, RWrite

This operation writes a hardware-determined random TLB way in a set determined by the VA (in TLB\_VPN) and page size (in TLB\_MISC) in the software-visible portion of the TLB structure. The general register Ra contains the data that is going to write into the TLB\_DATA portion of the TLB structure. The other write operands are in the TLB\_VPN and TLB\_MISC registers.

If all the ways in the specified set is all locked during the write operation of this instruction, depending on the setting in the TBALCK field of the MMU Control system register (MMU\_CTL), this operation may generate a precise or an imprecise "Data Machine Error" exception. Note that the default value of the TBALCK is to generate the exception.

The normal instruction sequence of performing the TLB Random Write operation is as follows:



### d. TLB Random Write and Lock

Syntax: TLBOP Ra, RWriteLock

This operation is similar to TLB Random Write operation to write a hardware-determined random TLB way in a set determined by the VA (in TLB\_VPN) and page size (in TLB\_MISC) in the software-visible portion of the TLB structure. In addition to the write operation, this instruction also locks the TLB entry.

If all the ways in the specified set is all locked during the write operation of this instruction, depending on the setting in the TBALCK field of the MMU Control system register (MMU\_CTL), this operation may generate a precise or an imprecise "Data Machine Error" exception. Note that the default value of the TBALCK is to generate the exception.

#### e. TLB Unlock

Syntax: TLBOP Ra, Unlock

This operation unlocks a TLB entry if the VA in the general register Ra matches the VPN of a set determined by the VA (in Ra) and page size (in TLB\_MISC).

### f. TLB Probe

#### Syntax: TLBOP Rt,Ra,Probe

This operation searches all TLB structures (software-visible and software-invisible) for a specified VA and generates an entry number where the VA matches the VPN in that entry. The search result is written into the general register Rt. The search VA is specified in the general register Ra. The result that stored in Rt has the following format:





If the VA can be found in the software-visible part of the TLB, the "sw" bit will be set. If the VA can be found in the software-invisible part of the TLB, the "hw" bit will be set. And if the VA cannot be found in either the software-visible or software-invisible part of the TLB, the "nf" bit will be set.

The TLB entry number for the non-fully-associative N sets K ways TLB cache is as follows:

Log2(N\*K)-1log2(N)Log2(N)-10Way numberSet number

The normal instruction sequence of performing the TLB probe operation is as follows:

```
tlbop Rt, Ra, PB // VA is in Ra
<Examine> Rt
```

If this instruction encounters a multiple match condition when searching the TLB, a precise "Data Machine Error" exception will be generated.

### g. TLB Invalidate VA

Syntax: TLBOP Ra, Invalidate

This operation flushes the TLB entry that contains the VA in the Ra register and the page size specified in the TLB\_MISC register (software-visible and software-invisible) except the locked TLB entries. The match condition also involves the "G" bit of a PTE entry and the CID field of the TLB\_MISC register.

Their matching logic is as follows:

- If "G" is asserted, CID *is not* part of the match condition.
- If "G" is not asserted, CID *is* part of the match condition.



The normal instruction sequence for this operation is as follows:

```
// prepare VA in Ra
...
tlbop Ra,INV // Invalidate TLB entries containing VA
isb // inst serialization barrier
```

Note that this TLB invalidate operation may flush more pages than the exact number of pages which contain this VA, up to flushing the entire TLB structure. And the exact behavior is implementation-dependent.

If this instruction encounters a multiple match condition when searching the TLB, all matched entries should be invalidated and no "Data Machine Error" exception will be generated.

#### h. TLB Invalidate All

```
Syntax: TLBOP FlushAll
```

This operation invalidates all TLB entries (software-visible and software-invisible) except the locked TLB entries.

The normal instruction sequence for this operation is as follows:

```
tlbop FLUA // TLB invalidate All
isb // Inst serialization barrier
```

### **Operations:**

```
If (SubType is not supported)
Exception(Reserved Instruction);
If (Op(SubType) == TargetRead) {
Entry_Addr = Ra;
{TLB_VPN, TLB_DATA, TLB_MISC} =
TLB_Entry_Read(Entry_Addr);
} else if (OP(SubType) == TargetWrite) {
Entry_Addr = Ra
TLB_Entry_Write(Entry_Addr, TLB_VPN, TLB_DATA, TLB_MISC);
```


```
} else if (OP(SubType) == RWrite) {
 TLB_Write(TLB_VPN, Ra, TLB_MISC, nolock);
} else if (OP(SubType) == RWriteLock) {
 TLB_Write(TLB_VPN, Ra, TLB_MISC, lock);
} else if (OP(SubType) == Unlock) {
 {found, Entry} = TLB Search(Ra);
 if (found) {
    TLB Unlock (Entry);
 }
} else if (OP(SubType) == Probe) {
 {found, Entry} = TLB_Search(Ra);
 Rt = {found, Entry};
} else if (OP(SubType) == Invalidate) {
 {found, Entry} = TLB Search(Ra);
 if (found) {
  TLB Invalidate (Entry);
 }
} else if (OP(SubType) == FlushAll) {
 Foreach TLB Entry {
  if (Is_Not_Locked(TLB_Entry)) {
    TLB Invalidate(TLB Entry);
   }
 }
}
```

#### **Exceptions:**

Privileged Instruction, Imprecise Machine Error

#### Privilege level: Superuser and above

#### Note:

(1) All non-instruction-fetch-related exceptions generated by a TLBOP instruction should have the INST field of the ITYPE register set to 0.



## **TRAP (Trap exception)**

#### Format:

| 31 | 30 25  | 24 20 | 19   | 5 | 4    | 0 |
|----|--------|-------|------|---|------|---|
| 0  | MISC   | 00000 | SWID |   | TRA  | Р |
|    | 110010 |       |      |   | 0010 | 1 |

#### Syntax: TRAP SWID

**Purpose:** It is used to generate an unconditional Trap exception.

#### **Description:**

TRAP instruction will unconditionally generate a Trap exception and transfer control to the Trap exception handler. The 15-bits SWID is used by software as a parameter to distinguish different trap features and usages.

#### **Operations:**

Generate\_Exception(Trap);

**Exceptions:** Trap

Privilege level: All Note:



## WSBH (Word Swap Byte within Halfword)

Type: 32-Bit Baseline Format:

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14   | 5      | 4  | 0   |
|----|-----|-----|----|----|----|----|------|--------|----|-----|
| 0  | ALU | J_1 | R  | Rt | R  | la | 0000 | 000000 | WS | BH  |
|    | 100 | 000 |    |    |    |    |      |        | 10 | 100 |

#### Syntax: WSBH Rt, Ra

**Purpose:** Swap the bytes within each halfword of a register.

**Description:** The bytes within each halfword of Ra is swapped. And the result is written to Rt.



#### **Operations:**

Rt = CONCAT (Ra (23,16), Ra (31,24), Ra (7,0), Ra (15,8));

Exceptions: None

Privilege level: All

Note:



## XOR (Bit-wise Logical Exclusive Or)

Type: 32-Bit Baseline Format:

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9   | 5   | 4   | 0   |
|----|-----|-----|----|----|----|----|----|----|-----|-----|-----|-----|
| 0  | ALU | U_1 | F  | Rt | R  | la | R  | .b | 000 | 000 | X   | OR  |
|    | 100 | 000 |    |    |    |    |    |    |     |     | 000 | JII |

Syntax: XOR Rt, Ra, Rb

**Purpose:** Doing a bit-wise logical Exclusive OR operation on the content of two registers. **Description:** The content of Ra is combined with the content of Rb using a bit-wise logical exclusive OR operation. And the result is written to Rt.

**Operations:** 

Rt = Ra ^ Rb;



## **XORI (Exclusive Or Immediate)**

**Type:** 32-Bit Baseline **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 0   |
|----|--------|-------|-------|--------|
| 0  | XORI   | Rt    | Ra    | imm15u |
|    | 101011 |       |       |        |

Syntax: XORI Rt, Ra, imm15u

**Purpose:** Bit-wise exclusive OR of the content of a register with an unsigned constant. **Description:** The content of Ra is bit-wise exclusive ORed with the zero-extended imm15u. And the result is written to Rt.

#### **Operations:**

 $Rt = Ra \wedge ZE(imm15u);$ 



## ZEB (Zero Extend Byte)

Type: 32-Bit Baseline Pseudo OP

Alias of: ANDI Rt, Ra, 0xFF

Syntax: ZEB Rt, Ra

Purpose: Zero-extend the least-significant-byte of a register.

**Description:** The least-significant-byte of Ra is zero-extended. And the result is written to Rt.

**Operations:** 

Rt = ZE(Ra(7,0));



## ZEH (Zero Extend Halfword)

Type: 32-Bit Baseline Format:

| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14    | 5      | 4   | 0   |
|----|-----|-----|----|----|----|----|-------|--------|-----|-----|
| 0  | ALU | J_1 | F  | Rt | R  | la | 00000 | 000000 | ZE  | EH  |
|    | 100 | 000 |    |    |    |    |       |        | 100 | 011 |

Syntax: ZEH Rt, Ra

**Purpose:** Zero-extend the least-significant-halfword of a register.

**Description:** The least-significant-halfword of Ra is zero-extended. And the result is written to Rt.

#### **Operations:**

Rt = ZE(Ra(15, 0));

Exceptions: None Privilege level: All

Note:



# 7.2 32-bit Performance Extension instructions

## ABS (Absolute)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14  | 6      | 5   | 0   |
|----|--------|-----|----|----|----|----|-----|--------|-----|-----|
| 0  | ALU    | J_2 | R  | Rt | R  | la | 000 | 000000 | A   | BS  |
|    | 100001 |     |    |    |    |    |     |        | 000 | 011 |

#### Syntax: ABS Rt, Ra

**Purpose:** Get the absolute value of a signed integer in a general register.

**Description:** This instruction calculates the absolute value of a signed integer stored in Ra. The result is written to Rt.

#### **Operations:**

```
if (Ra >= 0) {
    Rt = Ra;
} else {
    Rt = -Ra;
}
```



## AVE (Average with Rounding)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9  | 6   | 5   | 0   |
|----|--------|-----|----|----|----|----|----|----|----|-----|-----|-----|
| 0  | ALU    | J_2 | F  | Rt | R  | la | R  | .b | 00 | 000 | AV  | √Е  |
|    | 100001 |     |    |    |    |    |    |    |    |     | 000 | 010 |

```
Syntax: AVE Rt, Ra, Rb
```

Purpose: Calculate the average of the contents of two general registers.

**Description:** This instruction calculates the average value of two signed integers stored in Ra and Rb and rounds up a half-integer result to the nearest integer. The result is written to Rt.

**Operations:** 

```
Sum = CONCAT(Ra(31), Ra(31,0)) + CONCAT(Rb(31), Rb(31,0)) + 1;
Rt = Sum(32,1);
```



## BCLR (Bit Clear)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 30         25         24         20           ALU_2         Rt           100001 |   | 19 | 15 | 14 | 10 | 9   | 6  | 5 0 |     |     |
|----|--------|---------------------------------------------------------------------------------|---|----|----|----|----|-----|----|-----|-----|-----|
| 0  | ALU    | J_2                                                                             | F | Rt | R  | Ra | im | n5u | 00 | 000 | BC  | LR  |
|    | 100001 |                                                                                 |   |    |    |    |    |     |    |     | 001 | 001 |

Syntax: BCLR Rt, Ra, imm5u

**Purpose:** Clear an individual one bit from the content of a general register **Description:** This instruction clears an individual one bit from the value stored in Ra. The bit position is specified by the imm5u value. The cleared result is written to Rt.

**Operations:** 

onehot = 1 << imm5u; Rt = Ra & ~onehot;



## **BSET (Bit Set)**

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14  | 10  | 9  | 6   | 5   | 0   |
|----|--------|-----|----|----|----|----|-----|-----|----|-----|-----|-----|
| 0  | ALU    | J_2 | F  | Rt | R  | la | imr | n5u | 00 | 000 | BS  | ET  |
|    | 100001 |     |    |    |    |    |     |     |    |     | 001 | 000 |

Syntax: BSET Rt, Ra, imm5u

Purpose: Set an individual one bit from the content of a general register

**Description:** This instruction sets an individual one bit from the value stored in Ra. The bit position is specified by the imm5u value. The modified result is written to Rt.

**Operations:** 

```
onehot = 1 << imm5u;
Rt = Ra | onehot;
```



## BTGL (Bit Toggle)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14 | 10  | 9  | 6   | 5   | 0   |
|----|--------|-----|----|----|----|----|----|-----|----|-----|-----|-----|
| 0  | ALU    | J_2 | F  | Rt | R  | la | im | n5u | 00 | 000 | BT  | GL  |
|    | 100001 |     |    |    |    |    |    |     |    |     | 001 | 010 |

Syntax: BTGL Rt, Ra, imm5u

**Purpose:** Toggle an individual one bit from the content of a general register **Description:** This instruction toggles an individual one bit from the value stored in Ra. The bit position is specified by the imm5u value. The modified result is written to Rt.

**Operations:** 

```
onehot = 1 << imm5u;
Rt = Ra ^ onehot;
```

## **BTST (Bit Test)**

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14 | 10  | 9  | 6   | 5   | 0   |
|----|--------|-----|----|----|----|----|----|-----|----|-----|-----|-----|
| 0  | ALU    | J_2 | F  | Rt | R  | la | im | n5u | 00 | 000 | BT  | ЪТ  |
|    | 100001 |     |    |    |    |    |    |     |    |     | 001 | 011 |

```
Syntax: BTST Rt, Ra, imm5u
```

**Purpose:** Test an individual one bit from the content of a general register

**Description:** This instruction tests an individual one bit from the value stored in Ra. The bit position is specified by the imm5u value. If the bit is set, the result of one is written to Rt. If the bit is cleared, the result of zero is written to Rt.

#### **Operations:**

```
onehot = 1 << imm5u;
Rt = Ra & onehot;
if (Rt == 0) {
    Rt = 0;
} else {
    Rt = 1;
}
```



## CLIP (Clip Value)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14  | 10  | 9  | 6   | 5   | 0   |
|----|--------|-----|----|----|----|----|-----|-----|----|-----|-----|-----|
| 0  | ALU    | J_2 | F  | Rt | R  | la | imr | n5u | 00 | 000 | CI  | LIP |
|    | 100001 |     |    |    |    |    |     |     |    |     | 000 | 101 |

```
Syntax: CLIP Rt, Ra, imm5u
```

**Purpose:** Limit the signed integer of a register into an unsigned range.

**Description:** This instruction limits the signed integer stored in Ra into an unsigned integer range between  $2^{imm5u}$ -1 and 0. The limited result is written to Rt. For example, if imm5u is 0, the result should be always 0. If Ra is negative, then the result is 0 as well.

#### **Operations:**

```
if (Ra > 2<sup>imm5u</sup>-1) {
    Rt = 2<sup>imm5u</sup>-1;
} else if (Ra < 0) {
    Rt = 0;
} else {
    Rt = Ra;
}</pre>
```



## CLIPS (Clip Value Signed)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14  | 10    | 9 | 6    | 5   | 0     |  |
|----|--------|----|----|----|----|----|-----|-------|---|------|-----|-------|--|
| 0  | ALU_2  |    | F  | Rt | R  | la | imr | imm5u |   | 0000 |     | CLIPS |  |
|    | 100001 |    |    |    |    |    |     |       |   |      | 000 | 100   |  |

Syntax: CLIPS Rt, Ra, imm5u

**Purpose:** Limit the signed integer of a register into a signed range.

**Description:** This instruction limits the signed integer stored in Ra into a signed integer range between  $2^{imm5u}$ -1 and  $-2^{imm5u}$ . The limited result is written to Rt. For example, if imm5u is 3, the result should be between 7 and -8.

#### **Operations:**

```
if (Ra > 2<sup>imm5u</sup>-1) {
    Rt = 2<sup>imm5u</sup>-1;
} else if (Ra < -2<sup>imm5u</sup>) {
    Rt = -2<sup>imm5u</sup>;
} else {
    Rt = Ra;
}
```



## CLO (Count Leading Ones)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14  | 6      | 5   | 0    |  |
|----|--------|----|----|----|----|----|-----|--------|-----|------|--|
| 0  | ALU_2  |    | R  | lt | R  | la | 000 | 000000 | CLO |      |  |
|    | 100001 |    |    |    |    |    |     |        | 000 | 0110 |  |

#### Syntax: CLO Rt, Ra

**Purpose:** Count the number of successive ones leading from the most significant bit of a general register.

**Description:** Starting from the most significant bit (bit 31) of Ra, count the number of successive ones. The result is written to Rt. For example, if bit 31 of Ra is 0, the result is 0. If Ra has a value of 0xFFFFFFF, then the result should be 32.

#### **Operations:**

```
cnt = 0;
for (i = 31 to 0) {
    if (Ra(i) == 0) {
        break;
    } else {
        cnt = cnt + 1;
    }
}
Rt = cnt;
```



## CLZ (Count Leading Zeros)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14  | 6      | 5   | 0   |  |
|----|--------|----|----|----|----|----|-----|--------|-----|-----|--|
| 0  | ALU_2  |    | R  | lt | R  | la | 000 | 000000 | CLZ |     |  |
|    | 100001 |    |    |    |    |    |     |        | 000 | 111 |  |

#### Syntax: CLZ Rt, Ra

**Purpose:** Count the number of successive zeros leading from the most significant bit of a general register.

**Description:** Starting from the most significant bit (bit 31) of Ra, count the number of successive zeros. The result is written to Rt. For example, if bit 31 of Ra is 1, the result is 0. If Ra has a value of 0, then the result should be 32.

#### **Operations:**

```
cnt = 0;
for (i = 31 to 0) {
    if (Ra(i) == 1) {
        break;
    } else {
        cnt = cnt + 1;
    }
}
Rt = cnt;
```



## MAX (Maximum)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 6   | 5   | 0   |
|----|--------|----|----|----|----|----|----|----|---|-----|-----|-----|
| 0  | ALU_2  |    | F  | Rt | R  | la | R  | Rb |   | 000 | MAX |     |
|    | 100001 |    |    |    |    |    |    |    |   |     | 000 | 000 |

Syntax: MAX Rt, Ra, Rb

Purpose: Get the larger value of the contents of two general registers.

**Description:** This instruction compares two signed integers stored in Ra and Rb and picks the larger value as the result. The result is written to Rt.

#### **Operations:**

```
if (Ra >= Rb) {
    Rt = Ra;
} else {
    Rt = Rb;
}
```



## MIN (Minimum)

**Type:** 32-Bit Performance Extension **Format:** 

| 31 | 30   | 25          | 24 | 20 | 19 | 15 | 14 | 10 | 9  | 6   | 5        | 0         |
|----|------|-------------|----|----|----|----|----|----|----|-----|----------|-----------|
| 0  | ALU  | J_2         | F  | Rt | R  | Ra | R  | b  | 00 | 000 | M<br>000 | IN<br>001 |
|    | 1000 | <i>J</i> 01 |    |    |    |    |    |    |    |     | 000      | 001       |

#### Syntax: MIN Rt, Ra, Rb

**Purpose:** Get the smaller value of the contents of two general registers.

**Description:** This instruction compares two signed integers stored in Ra and Rb and picks the smaller value as the result. The result is written to Rt.

#### **Operations:**

```
if (Ra >= Rb) {
    Rt = Rb;
} else {
    Rt = Ra;
}
```



## 7.3 32-bit Performance Extension Version 2 instructions



## **BSE (Bit Stream Extraction)**

**Type:** 32-bit Performance Extension Version 2 **Format:** 

| 31 | 30 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 6    | 5 |       | 0 |
|----|--------|----|----|----|----|----|----|---|------|---|-------|---|
| 0  | ALU_2  |    | Rt |    | Ra |    | Rb |   | 0000 |   | BSE   |   |
|    | 100001 |    |    |    |    |    |    |   |      | C | 01100 |   |

#### Syntax: BSE Rt, Ra, Rb

**Purpose:** It is used to extract a number of bits from a register for bit stream extraction. **Description:** 

BSE instruction extracts a number of bits (1 to 32) from register Ra into lower bits of register Rt. If Rb(30), acted as an "underflow" flag, is 0 the non-occupied bits in Rt will be filled with 0, otherwise the non-occupied bits in Rt will be untouched. The number of bits extracted is specified in Rb(12,8)+1, and the *distance* between Ra(31) and the starting MSB bit position of the extracted bits in Ra is specified in Rb(4,0). After the bits are extracted, Rb(4,0) is incremented to be the *distance* between Ra(31) and the (LSB-1) bit position of the just extracted bits in Ra, making successive BSE extractions flowing from bit-31 to bit-0 in Ra. You can view Rb(4,0) as the number of bits that has already been extracted from Ra starting from Ra(31).

Although the non-occupied bits in Rt is untouched if Rb(30) is equal to 1, the Rb(12,8) will be updated with Rb(20,16) which should contain the old Rb(12,8) before the underflow condition. (Please see description below.)

The extraction operation with Rb(30) equal to 0 is illustrated in Figure 3 and with Rb(30) eaual to 1 is illustrated in Figure 4.

If the sum of Rb(4,0) (=N) and Rb(12,8) (=M) is equal to 31, then all remaining M+1 bits in Ra will be extracted, and Rb(5,0) will be set to 0x20 in preparation for the next bit stream BSE extraction. In addition, Rb(31), acted as a bitstream register "refill" flag, will be set to 1. This is illustrated in Figure 5. Note that setting Rb(5) to 1 is used for re-adjusting the Rb(4,0) if some bits that has been extracted will be re-extracted again.



If the sum of Rb(4,0) (=N) and Rb(12,8) (=M) is greater than 31, then all remaining 32-N bits in Ra will be extracted to Rt(M,M+N-31), and Rt(M+N-32,0) will be filled with 0. In this case, Rb(5,0) will be set to 0x20 and Rb(12,8) will be set to M+N-32, in preparation for the next bit stream BSE extraction. In addition, Rb(31), acted as a bitstream register "refill" flag, will be set to 1, Rb(30), acted as an "underflow" flag, will be set to 1 indicating that not all required bits are extracted, and the old Rb(12,8) will be saved in Rb(20,16) for recovery after the underflow processing. This is illustrated in Figure 6.

If register number Rt is equal to register number Rb, since both registers will be updated to different data, UNPREDICTABLE result will be written to the register.



Figure 3. Basic BSE operation with Rb(30) == 0





Figure 4. Basic BSE operation with Rb(30) == 1

Figure 5. BSE operation extracting all remaining bits with Rb(30) == 0.









Operations:

```
M = Rb[12:8];
if (Rb[30] == 0) {
   Len = M + 1;
   Rt[31:Len] = 0;
}
N = Rb[4:0];
D = M + N;
Rb[7:5] = 1;
                          // normal condition
if (31 > D) {
   Rb[4:0] = D + 1;
  Rb[31] = 0;
   Rt[M:0] = Ra[31-N:31-N-M];
   if (Rb[30] == 1) {
     Rb[12:8] = Rb[20:16];
     Rb[15:13] = 0;
   }
   Rb[30] = 0;
else if (31 == D) {
                      // empty condition
   Rb[4:0] = 0;
   Rb[30] = 0;
```



```
Rb[31] = 1;
Rt[M:0] = Ra[M:0];
} else if (31 < D) { // underflow condition
Rb[20:16] = M;
Rb[12:8] = D - 32;
Rb[4:0] = 0;
Rb[30] = 1;
Rb[31] = 1;
Rt[M:M+N-31] = Ra[31-N:0];
Rt[M+N-32:0] = 0;
}
```

#### Exceptions: None

#### Privilege level: All

#### Note:

- 1. A normal multiple-bits extraction that follows can be performed using several baseline instructions:
  - R4 is the bit stream register (i.e. Ra).
  - R2 is the number of bits to be extracted (i.e. Rb(12,8)+1).
  - R1 is the distance from MSB of the bit stream register to the starting bit position of the extraction (i.e. Rb(4,0)).

```
sll r3, r4, r1 // squeez out msb
subri r5, r2, 32 // calculate lsb squeeze count
srl r3, r3, r5 // squeeze out lsb
add r1, r1, r2 // move pointer to starting bit
```

So this instruction will save at least 3 instructions when compared to a normal extraction flow.

Following is an example Huffman decoding loop code using the BSE instruction:

```
LW Ra,[BitStream],4 # load 32b from BitStream
L1: Prepare Rb
L2: BSE Rt,Ra,Rb
```



```
BGTZ Rb,LOOKUP  # branch if no refill
LW Ra,[BitStream],4  # load 32b from BitStream
BTST Rk,Rb,30  # 1 if underflow
BNEZ Rk, L2  # branch if underflow
LOOKUP: Load Huffman table with index Rt
If end of code word {
Get the decoded data
Re-adjust Rb(4,0) if necessary
}
Adjust Rb(12,8) if necessary
J L2
```

- 2. BSP instructions can be used to update the Rb(12,8) and Rb(4,0) values.
- 3. Adjustment of Rb(4,0) value can be performed using basic addition and subtraction.



## **BSP (Bit Stream Packing)**

**Type:** 32-bit Performance Extension Version 2 **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 6   | 5   | (     | 0 |
|----|--------|----|----|----|----|----|----|----|---|-----|-----|-------|---|
| 0  | ALU_2  |    | Rt |    | R  | Ra |    | Rb |   | 000 | BSP |       |   |
|    | 100001 |    |    |    |    |    |    |    |   |     | (   | 01101 |   |

#### Syntax: BSP Rt, Ra, Rb

Purpose: It is used to insert a number of bits to a register for bit stream packing.

#### **Description:**

BSP instruction inserts a number of bits (1 to 32) from lower bits of register Ra (LSB side) into register Rt. The number of bits inserted is specified in Rb(12,8)+1, and the *distance* from Rt(31) to the MSB bit position of the inserted bits in Rt is specified in Rb(4,0). After the bits are inserted, Rb(4,0) is incremented to be the *distance* from Rt(31) to the (LSB-1) bit position of the just inserted bits in Rt, making successive BSP insertions flowing from bit-31 to bit-0 in Rt. This is illustrated in Figure 7.

If the sum of Rb(4,0) (=N) and Rb(12,8) (=M) is smaller than 31, and the Rb(30) is 1, indicating that a previous BSP operation is "overflowed", in addition to the usual packing operation, Rb(12,8) will be updated with Rb(20,16) which stores the original packing length before the overflow condition. (See description below.) This is illustrated in Figure 8.

If the sum of Rb(4,0) (=N) and Rb(12,8) (=M) is equal to 31, then all M+1 bits in Ra will be inserted to Rt and fill up all bits in Rt, and Rb(5,0) will be set to 0x20 in preparation for the next bit stream BSP packing operation. In addition, Rb(31), acted as a bitstream register "output" flag, will be set to 1. This is illustrated in Figure 9.

If the sum of Rb(4,0) (=N) and Rb(12,8) (=M) is greater than 31, the remaining number of bits in Rt is not enough to accommodate the number of bits needed packing in Ra. In this case, only 32-N bits will be inserted from Ra to Rt and the remaining M+N-31 bits in Ra will need to be inserted in the next BSP instruction. In this case, Rb(5,0) will be set to



0x20 and Rb(12,8) will be set to M+N-32, in preparation for the next bit stream BSP packing operation. In addition, Rb(31), acted as a bitstream register "output" flag, will be set to 1, R(30), acted as an "overflow" flag, will be set to 1 indicating that not all required bits are packed; and the old Rb(12,8) will be saved in Rb(20,16) for recovery after the overflow condition has been resolved. This is illustrated in Figure 10.

If register number Rt is equal to register number Rb, since both registers will be updated to different data, UNPREDICTABLE result will be written to the register.



Figure 7. Basic BSP operation





Figure 8. BSP operation with Rb(30) == 1

Figure 9. BSP operation filling up Rt.









Operations:

```
M = Rb[12:8];
N = Rb[4:0];
D = M + N;
Rb[7:5] = 1;
if (31 > D) {
                          // normal condition
   Rb[4:0] = D + 1;
   Rb[31] = 0;
   Rt[31-N:31-N-M] = Ra[M:0];
   if (Rb[30] == 1) {
      Rb[12:8] = Rb[20:16];
      Rb[15:13] = 0;
   }
   Rb[30] = 0;
} else if (31 == D) { // full condition
   Rb[4:0] = 0;
   Rb[30] = 0;
   Rb[31] = 1;
   Rt[M:0] = Ra[M:0];
} else if (31 < D) {
                           // overflow condition
   Rb[20:16] = M;
```



```
Rb[12:8] = D - 32;
Rb[4:0] = 0;
Rb[30] = 1;
Rb[31] = 1;
Rt[31-N:0] = Ra[M:M+N-31];
```

#### Exceptions: None

#### Privilege level: All

#### Note:

- 1. A normal multiple-bit packing operation that follows can be performed using several baseline instructions:
  - R4 contains the bits for packing (i.e. Ra).
  - R5 is the bit stream register (i.e. Rt)
  - R2 contains the number of bits for packing (i.e. Rb(12,8)+1).
  - R1 is the distance from MSB of the bit stream register to the starting bit position of the packing operation (i.e. Rb(4,0)).

```
subri r3, r2, 32
sll r3, r4, r3
srl r3, r3, r1
or r5, r5, r3
```

So this instruction will save at least 3 instructions when compared to a normal bit-packing flow.

Following is an example Huffman encoding loop code using the BSP instruction:

| L1: | Prepare Rb                                   |
|-----|----------------------------------------------|
|     | Prepare code word in Ra                      |
| L2: | BSP Rt,Ra,Rb                                 |
|     | BGTZ Rb,L1 # branch if no output             |
|     | SW Rt,[BitStream],4 # store 32b to BitStream |
|     | BTST Rt,Rb,30 # 1 if overflow                |
|     | BNEZ Rt,L2                                   |



J L1

- 2. BSP instructions can be used to update the Rb(12,8) and Rb(4,0) values.
- 3. Adjustment of Rb(4,0) value can be performed using basic addition and subtraction.



## **PBSAD (Parallel Byte Sum of Absolute Difference)**

**Type:** 32-bit Performance Extension Version 2 **Format:** 

| 31 | 30   | 25     | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 5     | 4   | 0     |  |
|----|------|--------|----|----|----|----|----|----|---|-------|-----|-------|--|
| 0  | SIMD |        | F  | Rt | Ra |    | R  | Rb |   | 00000 |     | PBSAD |  |
|    | 111  | 111000 |    |    |    |    |    |    |   | 00    | 000 |       |  |

Syntax: PBSAD Rt, Ra, Rb

**Purpose:** Calculate the sum of absolute difference of four unsigned 8-bit data elements. **Description:** The four un-signed 8-bit elements of Ra are subtracted from the four unsigned 8-bit elements of Rb. The absolute value of each difference is added together and the result is written to Rt.

#### **Operations:**

a(7,0) = ABS(Ra(7,0) - Rb(7,0)); b(7,0) = ABS(Ra(15,8) - Rb(15,8)); c(7,0) = ABS(Ra(23,16) - Rb(23,16)); d(7,0) = ABS(Ra(31,24) - Rb(31,24)); Rt = a(7,0) + b(7,0) + c(7,0) + d(7,0);



### **PBSADA (Parallel Byte Sum of Absolute Difference Accum)**

**Type:** 32-bit Performance Extension Version 2 **Format:** 

| 31 | 30     | 25 | 24 | 20  | 19 | 15 | 14 | 10 | 9 | 5     | 4  | 0      |  |
|----|--------|----|----|-----|----|----|----|----|---|-------|----|--------|--|
| 0  | SIMD   |    | R  | lt. | Ra |    | R  | Rb |   | 00000 |    | PBSADA |  |
|    | 111000 |    |    |     |    |    |    |    |   |       | 00 | 001    |  |

#### Syntax: PBSADA Rt, Ra, Rb

**Purpose:** Calculate the sum of absolute difference of four unsigned 8-bit data elements and accumulate it into a register.

**Description:** The four un-signed 8-bit elements of Ra are subtracted from the four unsigned 8-bit elements of Rb. The absolute value of each difference is added together along with the content of Rt. The accumulated result is written back to Rt.

#### **Operations:**

a(7,0) = ABS(Ra(7,0) - Rb(7,0)); b(7,0) = ABS(Ra(15,8) - Rb(15,8)); c(7,0) = ABS(Ra(23,16) - Rb(23,16)); d(7,0) = ABS(Ra(31,24) - Rb(31,24)); Rt = Rt + a(7,0) + b(7,0) + c(7,0) + d(7,0);



## 7.4 32-bit STRING Extension instructions

This STRING extension is currently reserved for Andes Technology internal use. It will be released in the future.


# 7.5 16-bit Baseline instructions



# ADD (Add Register)

**Type:** 16-Bit Baseline **Format:** 

ADD333

| 15 | 14   | 9   | 8 | 6  | 5 | 3  | 2 | 0   |
|----|------|-----|---|----|---|----|---|-----|
| 1  | ADD  | 333 | R | t3 | R | a3 | R | .b3 |
|    | 0011 | 00  |   |    |   |    |   |     |

ADD45

| 15 | 14  | 9   | 8 | 5  | 4 | 0   |
|----|-----|-----|---|----|---|-----|
| 1  | ADI | 045 | R | t4 | ] | Rb5 |
|    | 000 | 100 |   |    |   |     |

**Syntax:** ADD333 Rt3, Ra3, Rb3 ADD45 Rt4, Rb5

**32-bit Equivalent:** ADD
 3T5(Rt3), 3T5(Ra3), 3T5(Rb3)
 // ADD333

 ADD
 4T5(Rt4), 4T5(Rt4), Rb5
 // ADD45

Purpose: It is used to add the contents of two registers.

**Description:** For ADD333, the contents of Ra3 and Rb3 are added. And the result is written to Rt3. For ADD45, the contents of Rt4 and Rb5 are added. And the result is written to the source register Rt4.

#### **Operations:**

```
Rt3 = Ra3 + Rb3; // ADD333
Rt4 = Rt4 + Rb5; // ADD45
```

Exceptions: None

Privilege level: All

Note:



### ADDI (Add Immediate)

**Type:** 16-Bit Baseline **Format:** 

15

# ADDI333 14 9 8 6

|   |         |     |     | - •   |
|---|---------|-----|-----|-------|
| 1 | ADDI333 | Rt3 | Ra3 | imm3u |
|   | 001110  |     |     |       |

#### ADDI45

| 15 | 14  | 9    | 8 | 5  | 4  | 0    |
|----|-----|------|---|----|----|------|
| 1  | ADE | DI45 | R | t4 | in | nm5u |
|    | 000 | 110  |   |    |    |      |

Syntax: ADDI333 Rt3, Ra3, imm3u ADDI45 Rt4, imm5u

**32-bit Equivalent:** ADDI 3T5(Rt3), 3T5(Ra3), ZE(imm3u) // ADDI333 ADDI 4T5(Rt4), 4T5(Rt4), ZE(imm5u) // ADDI45

**Purpose:** It is used to add a zero-extended immediate into the content of a register. **Description:** For ADDI333, the zero-extended 3-bit immediate "imm3u" is added to the content of Ra3. And the result is written to Rt3. For ADDI45, the zero-extended 5-bit immediate is added to the content of Rt4. And the result is written to the source register Rt4.

2 0

5 3

#### **Operations:**

Rt3 = Ra3 + ZE(imm3u); // ADDI333 Rt4 = Rt4 + ZE(imm5u); // ADDI45

Exceptions: None Privilege level: All

Note:



# **BEQS38 (Branch on Equal Implied R5)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 11   | 10    | 8     | 7   | 0   |
|----|-----|------|-------|-------|-----|-----|
| 1  | BEQ | QS38 | Rt3   |       | imr | n8s |
|    | 10  | 10   | (#Rt3 | != 5) |     |     |

Syntax: BEQS38 Rt3, imm8s

**32-bit Equivalent:** BEQ 3T5(Rt3), R5, SE(imm8s)

(next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the contents of a register with the content of the implied R5.

**Description:** If the content of the implied register R5 is equal to the content of Rt3 (#Rt3 != 5), then branch to the target address of adding the current instruction address with the sign-extended (imm8s << 1) value. The branch range is ± 256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (R5 == Rt3) {
    PC = TAddr;
}
Exceptions: None</pre>
```



### **BEQZ38 (Branch on Equal Zero)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 11   | 10 | 8 | 7   | 0   |
|----|-----|------|----|---|-----|-----|
| 1  | BEÇ | QZ38 | Rt | 3 | imn | n8s |
|    | 10  | 00   |    |   |     |     |

Syntax: BEQZ38 Rt3, imm8s

**32-bit Equivalent:** BEQZ 3T5(Rt3), SE(imm8s) (next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt3 is equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm8s  $\ll$  1) value. The branch range is  $\pm$  256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (Rt3 == 0) {
    PC = TAddr;
}
Exceptions: None</pre>
```



# **BEQZS8 (Branch on Equal Zero Implied R15)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14   | 8    | 7   | 0   |
|----|------|------|-----|-----|
| 1  | BEQ  | ZS8  | imr | n8s |
|    | 1101 | 1000 |     |     |

Syntax: BEQZS8 imm8s

**32-bit Equivalent:** BEQZ R15, SE(imm8s)

(next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of R15 with zero.

**Description:** If the content of R15 is equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm8s  $\leq 1$ ) value. The branch range is  $\pm$  256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (R15 == 0) {
    PC = TAddr;
}
Exceptions: None</pre>
```



# **BNES38 (Branch on Not Equal Implied R5)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 11   | 10    | 8     | 7   | 0   |
|----|-----|------|-------|-------|-----|-----|
| 1  | BNE | ES38 | Rt3   |       | imn | n8s |
|    | 10  | 11   | (#Rt3 | != 5) |     |     |

Syntax: BNES38 Rt3, imm8s

**32-bit Equivalent:** BNE 3T5(Rt3), R5, SE(imm8s)

(next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with the content of the implied R5.

**Description:** If the content of the implied register R5 is not equal to the content of Rt3 (#Rt3 != 5), then branch to the target address of adding the current instruction address with the sign-extended (imm8s << 1) value. The branch range is  $\pm 256$  bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (R5 != Rt3) {
    PC = TAddr;
}
Exceptions: None</pre>
```



# **BNEZ38 (Branch on Not Equal Zero)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 11   | 10 | 8  | 7   | 0   |
|----|-----|------|----|----|-----|-----|
| 1  | BNE | EZ38 | Rt | :3 | imr | n8s |
|    | 10  | 01   |    |    |     |     |

Syntax: BNEZ38 Rt3, imm8s

**32-bit Equivalent:** BNEZ 3T5(Rt3), SE(imm8s) (next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of a register with zero.

**Description:** If the content of Rt3 is not equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm8s  $\leq 1$ ) value. The branch range is  $\pm$  256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (Rt3 != 0) {
    PC = TAddr;
}
Exceptions: None</pre>
```



### **BNEZS8 (Branch on Not Equal Zero Implied R15)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14   | 8    | 7   | 0   |
|----|------|------|-----|-----|
| 1  | BNE  | EZS8 | imr | n8s |
|    | 1101 | 001  |     |     |

Syntax: BNEZS8 imm8s

**32-bit Equivalent:** BNEZ R15, SE(imm8s)

(next sequential PC = PC + 2)

**Purpose:** It is used for conditional PC-relative branching based on the result of comparing the content of R15 with zero.

**Description:** If the content of R15 is not equal to zero, then branch to the target address of adding the current instruction address with the sign-extended (imm8s  $\leq 1$ ) value. The branch range is  $\pm$  256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);
If (R15 != 0) {
    PC = TAddr;
}
Exceptions: None</pre>
```



### BREAK16 (Breakpoint)

**Type:** 16-Bit Baseline **Format:** 

 15
 14
 9
 8
 0

 1
 BREAK16
 SWID9

 110101

### Syntax: BREAK16 SWID9

**32-bit Equivalent:** BREAK ZE(SWID9)

Purpose: It is used to generate a Breakpoint exception.

#### **Description:**

BREAK16 instruction will unconditionally generate a Breakpoint exception and transfer control to the Breakpoint exception handler. The 9-bits SWID is used by software as a parameter to distinguish different breakpoint features and usages.

#### **Operations:**

Generate\_Exception(Breakpoint);

**Exceptions:** Breakpoint



# J8 (Jump Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14   | 8       | 7  | 0   |
|----|------|---------|----|-----|
| 1  | J    | 8       | im | m8s |
|    | 1010 | 1010101 |    |     |

Syntax: J8 imm8s

**32-bit Equivalent:** J SE(imm8s)

**Purpose:** It is used for unconditional PC-relative branching.

**Description:** Jump unconditionally to the target address of adding the current instruction address with the sign-extended (imm8s  $\ll$  1) value. The branch range is  $\pm$  256 bytes.

#### **Operations:**

```
TAddr = PC + Sign_Extend(imm8s << 1);</pre>
```

PC = TAddr;



# JR5 (Jump Register)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 5     | 4  | 0 |
|----|--------|-------|----|---|
| 1  | JR     | .5    | Rt | 5 |
|    | 101110 | 01000 |    |   |

Syntax: JR5 Rb5

32-bit Equivalent: JR Rb5

**Purpose:** It is used for unconditional branching to an address stored in a general register. **Description:** Jump unconditionally to the target address stored in the register Rb5.

#### **Operations:**

```
TAddr = Rb5;
```

PC = TAddr;

Exceptions: None Privilege level: All

Note:



### JRAL5 (Jump Register and Link)

**Type:** 16-Bit Baseline **Format:** 

 15
 14
 5
 4
 0

 1
 JRAL5
 Rb5

 1011101001

**Syntax:** JRAL5 Rb5 **32-bit Equivalent:** JRAL Rb5 (R30 = PC + 2)

**Purpose:** It is used for unconditional branching to a function call.

**Description:** Jump unconditionally to the target address stored in the register Rb5. And the next sequential address (PC + 2) of the current instruction is written to R30.

### **Operations:**

```
TAddr = Rb5;
R30 = PC + 2;
PC = TAddr;
```



# LBI333 (Load Byte Immediate Unsigned)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14 9   | 8 6 | 5 3 | 2 0   |
|----|--------|-----|-----|-------|
| 1  | LBI333 | Rt3 | Ra3 | imm3u |
|    | 010011 |     |     |       |

Syntax: LBI333 Rt3, [Ra3, imm3u]

**32-bit Equivalent:** LBI 3T5(Rt3), [3T5(Ra3), ZE(imm3u)]

**Purpose:** It is used to load a zero-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a byte from the memory address specified by adding the content of Ra3 with the zero-extended imm3u value. The loaded byte is zero-extended to the width of the general register and then written into Rt3.

#### **Operations:**

```
VAddr = Ra3 + Zero_Extend(imm3u);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Bdata(7,0) = Load_Memory(PAddr, BYTE, Attributes);
    Rt3 = Zero_Extend(Bdata(7,0));
} else {
    Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



### LHI333 (Load Halfword Immediate Unsigned)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14 9   | 8 6 | 5 3 | 2 0   |
|----|--------|-----|-----|-------|
| 1  | LHI333 | Rt3 | Ra3 | imm3u |
|    | 010010 |     |     |       |

**Syntax:** LHI333 Rt3, [Ra3 + (imm3u << 1)]

```
32-bit Equivalent: LHI 3T5(Rt3), [3T5(Ra3) + ZE((imm3u << 1))]
```

(imm3u is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to load a zero-extended 16-bit halfword from memory into a general register.

**Description:** This instruction loads a halfword from the memory address specified by adding the content of Ra3 with the zero-extended (imm3u << 1) value. The loaded halfword is zero-extended to the width of the general register and then written into Rt3. Notice that imm3u is a halfword-aligned offset.

The memory address has to be halfword-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = Ra3 + Zero_Extend((imm3u << 1));
if (!Halfword_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
    Rt3 = Zero_Extend(Hdata(15,0));
} else {
    Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection,



Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



### LWI333 (Load Word Immediate)

| Type: 16-Bit Baseline |
|-----------------------|
| Format:               |
| I W/1222              |

|    |        | LW155 | 3 |    |   |     |     |     |
|----|--------|-------|---|----|---|-----|-----|-----|
| 15 | 14     | 9     | 8 | 6  | 5 | 3   | 2   | 0   |
| 1  | LWI333 |       | R | t3 | R | .a3 | imn | n3u |
|    | 010    | 000   |   |    |   |     |     |     |

#### LWI333.bi

| 15 | 14 9      | 8 6 | 5 3 | 2 0   |
|----|-----------|-----|-----|-------|
| 1  | LWI333.bi | Rt3 | Ra3 | imm3u |
|    | 010001    |     |     |       |

 Syntax:
 LWI333
 Rt3, [Ra3 + (imm3u << 2)]</td>

 LWI333.bi
 Rt3, [Ra3], (imm3u << 2)</td>

 32-bit Equivalent:
 LWI
 3T5(Rt3), [3T5(Ra3) + ZE(imm3u << 2)]</td>

 LWI.bi
 3T5(Rt3), [3T5(Ra3)], ZE(imm3u << 2)</td>

(imm3u is a word offset. In assembly programming, always write a byte offset.)

Purpose: It is used to load a 32-bit word from memory into a general register.

**Description:** This instruction loads a word from the memory into the general register Rt3. Two different forms are used to specify the memory address. The regular form uses Ra3 +  $ZE(imm3u \le 2)$  as its memory address while the .bi form uses Ra3. For the .bi form, the Ra3 register will be updated with the Ra3 +  $ZE(imm3u \le 2)$  value after the memory load operation.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
Addr = Ra3 + Zero_Extend((imm3u << 2));
If (.bi form) {
    Vaddr = Ra3;
} else {
    Vaddr = Addr;
}</pre>
```



```
if (!Word_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
   Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
   Rt3 = Wdata(31,0);
   If (.bi form) { Ra3 = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



### LWI37 (Load Word Immediate with Implied FP)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14 11 | 10 8 | 7     | 6    | 0 |
|----|-------|------|-------|------|---|
| 1  | XWI37 | Rt3  | LWI37 | imm7 | u |
|    | 0111  |      | 0     |      |   |

**Syntax:** LWI37 Rt3, [FP + (imm7u << 2)]

```
32-bit Equivalent: LWI 3T5(Rt3), [FP + ZE(imm7u << 2)]
```

(imm7u is a word offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to load a 32-bit word from memory into a general register. **Description:** This instruction loads a word from the memory into the general register Rt3. The memory address is specified by adding the implied FP (i.e. R28) register with the zero-extended (imm7u << 2) value. Notice that the imm7u is a word-aligned offset. The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = FP (i.e. R28) + Zero_Extend(imm7u << 2);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
    Rt3 = Wdata(31,0);
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All

AndeStar\_ISA\_v1.3

**Detail Instruction Description** 



Note:



# LWI450 (Load Word Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9   | 8 | 5  | 4   | 0 |
|----|--------|-----|---|----|-----|---|
| 1  | LWI450 |     | R | t4 | Ra5 |   |
|    | 011    | 010 |   |    |     |   |

**Syntax:** LWI450 Rt4, [Ra5]

**32-bit Equivalent:** LWI 4T5(Rt4), [Ra5 + 0]

**Purpose:** It is used to load a 32-bit word from memory into a general register.

**Description:** This instruction loads a word from the memory into the general register Rt4. The memory address is specified in Ra5.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = Ra5;
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
    Rt4 = Wdata(31,0);
} else {
    Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. Privilege level: All Note:



### **MOV55 (Move Register)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 10   | 9 | 5  | 4 | 0   |
|----|-----|------|---|----|---|-----|
| 1  | MO  | VI55 | R | t5 | ŀ | Ra5 |
|    | 000 | 000  |   |    |   |     |

Syntax: MOVI55 Rt5, Ra5

32-bit Equivalent: ADDI/ORI Rt5, Ra5, 0

**Purpose:** It is used to move contents between general-purpose registers.

Description: The content of Ra5 is moved into Rt5.

**Operations:** 

Rt5 = Ra5;

Exceptions: None



### MOVI55 (Move Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 10   | 9 | 5  | 4  | 0    |
|----|-----|------|---|----|----|------|
| 1  | MO  | VI55 | R | t5 | in | nm5s |
|    | 000 | 001  |   |    |    |      |

Syntax: MOVI55 Rt5, imm5s

**32-bit Equivalent:** MOVI Rt5, SE(imm5s)

**Purpose:** It is used to move a sign-extended immediate into a general-purpose register.

**Description:** The sign-extended 5-bit immediate "imm5s" is moved into Rt5.

**Operations:** 

Rt5 = Sign\_Extend(imm5s);



# NOP16 (No Operation)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 9    | 8  | 5   | 4 | 0    |
|----|-----|------|----|-----|---|------|
| 1  | SRL | JI45 | NO | P16 | N | OP16 |
|    | 001 | 001  | 00 | 000 | 0 | 0000 |

 Syntax:
 NOP16

 (SRLI45
 R0, 0)

 32-bit Equivalent:
 NOP

 (SRL
 R0, R0, 0)

**Purpose:** It is used to do no operation. It may be used to align program code for any specific purpose.

**Description:** This instruction is aliased to "SRLI45 R0, 0" instruction in hardware.

### **Operations:**

None



### **RET5 (Return from Register)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 5     | 4  | 0 |
|----|--------|-------|----|---|
| 1  | RE     | Т5    | Rb | 5 |
|    | 101110 | 01100 |    |   |

Syntax: RET5 Rb5

### 32-bit Equivalent: RET Rb5

**Purpose:** It is used for unconditional function call return to an address stored in a general register.

**Description:** Jump unconditionally to the target address stored in the register Rb5. Note that the architecture behavior of this instruction is the same as the JR5 instruction. But software will use this instruction instead of JR5 for function call return purpose. This facilitates software's need to distinguish the two different usages which is helpful in call stack backtracing applications. Distinguishing a function return jump from a regular jump will also help on implementation performance (e.g. return address prediction).

#### **Operations:**

TAddr = Rb5; PC = TAddr;



### **SBI333 (Store Byte Immediate)**

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9   | 8 | 6  | 5 | 3  | 2   | 0   |
|----|--------|-----|---|----|---|----|-----|-----|
| 1  | SBI333 |     | R | t3 | R | a3 | imn | n3u |
|    | 0101   | 111 |   |    |   |    |     |     |

Syntax: SBI333 Rt3, [Ra3 + imm3u]

**32-bit Equivalent:** SBI 3T5(Rt3), [3T5(Ra3) + ZE(imm3u)]

**Purpose:** It is used to store an 8-bit byte from a general register into a memory location. **Description:** The least-significant 8-bit byte in the general register Rt3 is stored to the memory location whose address is specified by adding the content of Ra3 with the zero-extended imm3u value.

#### **Operations:**

```
VAddr = Ra3 + Zero_Extend(imm3u);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, BYTE, Attributes, Rt3(7,0));
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



### SEB33 (Sign Extend Byte)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14      | 9 | 8 | 6   | 5 | 3   | 2  | 0     |  |
|----|---------|---|---|-----|---|-----|----|-------|--|
| 1  | BFMI333 |   | R | Rt3 |   | Ra3 |    | SEB33 |  |
|    | 001011  |   |   |     |   |     | 01 | 0     |  |

Syntax: SEB33 Rt3, Ra3

**32-bit Equivalent:** SEB 3T5(Rt3), 3T5(Ra3)

**Purpose:** It is used to sign-extend the least-significant byte of Ra3 and the result is written to Rt3.

**Description:** The least-significant byte of Ra3 is sign-extended to the width of a general register. And the result is written to the register Rt3.

#### **Operations:**

```
Rt3 = Sign_Extend(Ra3(7,0));
```



### SEH33 (Sign Extend Halfword)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14      | 9 | 8 | 6   | 5 | 3   | 2  | 0     |  |
|----|---------|---|---|-----|---|-----|----|-------|--|
| 1  | BFMI333 |   | R | Rt3 |   | Ra3 |    | SEH33 |  |
|    | 001011  |   |   |     |   |     | 01 | 1     |  |

Syntax: SEH33 Rt3, Ra3

**32-bit Equivalent:** SEH 3T5(Rt3), 3T5(Ra3)

**Purpose:** It is used to sign-extend the least-significant halfword of Ra3 and the result is written to Rt3.

**Description:** The least-significant halfword of Ra3 is sign-extended to the width of a general register. And the result is written to the register Rt3.

#### **Operations:**

```
Rt3 = Sign_Extend(Ra3(15,0));
```



### SHI333 (Store Halfword Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14   | 9   | 8 | 6  | 5 | 3  | 2   | 0   |
|----|------|-----|---|----|---|----|-----|-----|
| 1  | SHE  | 333 | R | t3 | R | a3 | imn | n3u |
|    | 0101 | 10  |   |    |   |    |     |     |

**Syntax:** SHI333 Rt3, [Ra3 + (imm3u << 1)]

**32-bit Equivalent:** SHI 3T5(Rt3), [3T5(Ra3) + ZE(imm3u << 1)]

(imm3u is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to store a 16-bit halfword from a general register into a memory location.

**Description:** The least-significant 16-bit halfword in the general register Rt3 is stored to the memory location whose address is specified by adding the content of Ra3 with the zero-extended (imm $3u \ll 1$ ) value. Notice that imm3u is a halfword-aligned offset. The memory address has to be halfword-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = Ra3 + Zero_Extend(imm3u << 1);
if (!Halfword_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store_Memory(PAddr, HALFWORD, Attributes, Rt3(15,0));
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All

AndeStar\_ISA\_v1.3

**Detail Instruction Description** 



Note:



# SLLI333 (Shift Left Logical Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14      | 9   | 8 | 6   | 5 | 3   | 2 | 0   |
|----|---------|-----|---|-----|---|-----|---|-----|
| 1  | SLLI333 |     | R | Rt3 |   | Ra3 |   | m3u |
|    | 0010    | )10 |   |     |   |     |   |     |

Syntax: SLLI333 Rt3, Ra3, imm3u

**32-bit Equivalent:** SLLI 3T5(Rt3), 3T5(Ra3), ZE(imm3u)

**Purpose:** It is used to left-shift a register content to a fixed number of bits in the range of 0 to 7.

**Description:** The content of Ra3 is shifted left by a fixed number of bits between 0 and 7 specified by the "imm3u". And zero will be shifted in to fill the shifted-out bits. The shifted result is written to Rt3.

#### **Operations:**

Rt3 = Ra3 << imm3u; // or Rt3 = Ra3(31-imm3u,0).Dupliate(0, imm3u);



### SLT45 (Set on Less Than Unsigned)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9 | 8 | 5  | 4   | 0 |
|----|--------|---|---|----|-----|---|
| 1  | SLT45  |   | R | a4 | Rb5 |   |
|    | 110001 |   |   |    |     |   |

Syntax: SLT45 Ra4, Rb5

**32-bit Equivalent:** SLT R15, 4T5(Ra4), Rb5

**Purpose:** It is used to set a result on R15 for an unsigned less-than comparison.

**Description:** This instruction compares the contents of Ra4 and Rb5 as unsigned integers. If Ra4 is less than Rb5, then R15 will be written with 1; otherwise, R15 will be written with 0.

#### **Operations:**

```
If (Ra4 (unsigned) < Rb5) {
    R15 = 1;
} else {
    R15 = 0;
}</pre>
```



# SLTI45 (Set on Less Than Unsigned Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9 | 8 | 5  | 4     | 0 |
|----|--------|---|---|----|-------|---|
| 1  | SLTI45 |   | R | a4 | imm5u |   |
|    | 110011 |   |   |    |       |   |

Syntax: SLTI45 Ra4, imm5u

**32-bit Equivalent:** SLTI R15, 4T5(Ra4), ZE(imm5u)

**Purpose:** It is used to set a result on R15 for an unsigned less-than comparison.

**Description:** This instruction compares the contents of Ra4 and a zero-extended imm5u as unsigned integers. If Ra4 is less than the zero-extended imm5u, then R15 will be written with 1; otherwise, R15 will be written with 0.

#### **Operations:**

```
If (Ra4 (unsigned) < Zero_Extend(imm5u)) {
    R15 = 1;
} else {
    R15 = 0;
}
Exceptions: None</pre>
```



### SLTS45 (Set on Less Than Signed)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9 | 8 | 5  | 4   | 0 |
|----|--------|---|---|----|-----|---|
| 1  | SLTS45 |   | R | a4 | Rb5 |   |
|    | 110000 |   |   |    |     |   |

Syntax: SLTS45 Ra4, Rb5

**32-bit Equivalent:** SLTS R15, 4T5(Ra4), Rb5

**Purpose:** It is used to set a result on R15 for a signed less-than comparison.

**Description:** This instruction compares the contents of Ra4 and Rb5 as signed integers. If Ra4 is less than Rb5, then R15 will be written with 1; otherwise, R15 will be written with 0.

#### **Operations:**

```
If (Ra4 (signed) < Rb5) {
    R15 = 1;
} else {
    R15 = 0;
}</pre>
```



# SLTSI45 (Set on Less Than Signed Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14      | 9   | 8 | 5  | 4 0   |  |
|----|---------|-----|---|----|-------|--|
| 1  | SLTSI45 |     | R | a4 | imm5u |  |
|    | 1100    | 010 |   |    |       |  |

Syntax: SLTSI45 Ra4, imm5u

**32-bit Equivalent:** SLTSI R15, 4T5(Ra4), ZE(imm5u)

**Purpose:** It is used to set a result on R15 for a signed less-than comparison.

**Description:** This instruction compares the contents of Ra4 and a zero-extended imm5u as signed integers. If Ra4 is less than the zero-extended imm5u, then R15 will be written with 1; otherwise, R15 will be written with 0.

#### **Operations:**

```
If (Ra4 (signed) < Zero_Extend(imm5u)) {
    R15 = 1;
} else {
    R15 = 0;
}
Exceptions: None</pre>
```



# SRAI45 (Shift Right Arithmetic Immediate)

Type: 16-Bit Baseline Format:

| 15 | 14     | 9 | 8 | 5   | 4 | 0    |
|----|--------|---|---|-----|---|------|
| 1  | SRAI45 |   | R | Rt4 |   | 1m5u |
|    | 001000 |   |   |     |   |      |

Syntax: SRAI45 Rt4, imm5u

**32-bit Equivalent:** SRAI 4T5(Rt4), 4T5(Rt4), imm5u

**Purpose:** It is used to right-shift a register content arithmetically (i.e. maintaining the sign of the original value) to a fixed number of bits in the range of 0 to 31.

**Description:** The content of Rt4 is shifted right by a fixed number of bits between 0 and 31 specified by the "imm5u". And the sign bit of Rt4, i.e. Rt4(31), will be duplicated to fill the shifted-out bits. The shifted result is written to the source register Rt4.

#### **Operations:**

```
Rt4 = Rt4 (sign)>> imm5u; // or
Rt4 = Duplicate(Rt4(31), imm5u).Rt4(31,imm5u);
```


## SRLI45 (Shift Right Logical Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14  | 9    | 8 | 5  | 4  | 0    |
|----|-----|------|---|----|----|------|
| 1  | SRL | .I45 | R | t4 | in | nm5u |
|    | 001 | 001  |   |    |    |      |

Syntax: SRLI45 Rt4, imm5u

**32-bit Equivalent:** SRLI 4T5(Rt4), 4T5(Rt4), imm5u

**Purpose:** It is used to right-shift a register content logically (i.e. filling in zeros at the shifted-out bits) to a fixed number of bits in the range of 0 to 31.

**Description:** The content of Rt4 is shifted right by a fixed number of bits between 0 and 31 specified by the "imm5u". And the shifted-out bits will be filled with zeros. The shifted result is written to the source register Rt4.

#### **Operations:**

Rt4 = Rt4 (0)>> imm5u; // or Rt4 = Duplicate(0, imm5u).Rt4(31,imm5u);



### SUB (Subtract Register)

Type: 16-Bit Baseline

### Format:

**SUB333** 

| 15 | 14  | 9      | 8 | 6   | 5 | 3   | 2 | 0   |
|----|-----|--------|---|-----|---|-----|---|-----|
| 1  | SUB | SUB333 |   | Rt3 |   | Ra3 |   | .b3 |
|    | 001 | 01     |   |     |   |     |   |     |

SUB45

| 15 | 14   | 9           | 8 | 5  | 4 | 0   |
|----|------|-------------|---|----|---|-----|
| 1  | SUE  | <b>3</b> 45 | R | t4 | ] | Rb5 |
|    | 0001 | 101         |   |    |   |     |

**Syntax:** SUB333 Rt3, Ra3, Rb3 SUB45 Rt4, Rb5

**32-bit Equivalent:** SUB
 3T5(Rt3), 3T5(Ra3), 3T5(Rb3)
 // SUB333

 SUB
 4T5(Rt4), 4T5(Rt4), Rb5
 // SUB45

Purpose: It is used to subtract the contents of two registers.

**Description:** For SUB333, the content of Rb3 is subtracted from Ra3. And the result is written to Rt3. For SUB45, the content of Rb5 is subtracted from Rt4. And the result is written to the source register Rt4.

#### **Operations:**

```
Rt3 = Ra3 - Rb3; // SUB333
Rt4 = Rt4 - Rb5; // SUB45
```

Exceptions: None

Privilege level: All

Note:



### **SUBI (Subtract Immediate)**

**Type:** 16-Bit Baseline **Format:** 

SUBI333

| 15 | 14      | 9   | 8 | 6   | 5 | 3   | 2 | 0   |
|----|---------|-----|---|-----|---|-----|---|-----|
| 1  | SUBI333 |     | R | Rt3 |   | Ra3 |   | m3u |
|    | 001     | 111 |   |     |   |     |   |     |

SUBI45

| 15 | 14  | 9   | 8 | 5  | 4  | 0    |
|----|-----|-----|---|----|----|------|
| 1  | SUB | I45 | R | t4 | in | nm5u |
|    | 000 | 111 |   |    |    |      |

Syntax: SUBI333 Rt3, Ra3, imm3u SUBI45 Rt4, imm5u

**32-bit Equivalent:** ADDI 3T5(Rt3), 3T5(Ra3), NEG(imm3u) // SUBI333 ADDI 4T5(Rt4), 4T5(Rt4), NEG(imm5u) // SUBI45

**Purpose:** It is used to subtract a zero-extended immediate from the content of a register. **Description:** For SUBI333, the zero-extended 3-bit immediate "imm3u" is subtracted from the content of Ra3. And the result is written to Rt3. For SUBI45, the zero-extended 5-bit immediate is subtracted from the content of Rt4. And the result is written to the source register Rt4.

**Operations:** 

Rt3 = Ra3 - ZE(imm3u); // SUBI333 Rt4 = Rt4 - ZE(imm5u); // SUBI45



### SWI333 (Store Word Immediate)

**Type:** 16-Bit Baseline **Format:** 

|    |     | SWI33 | 3 |    |   |    |     |     |
|----|-----|-------|---|----|---|----|-----|-----|
| 15 | 14  | 9     | 8 | 6  | 5 | 3  | 2   | 0   |
| 1  | SWI | 333   | R | t3 | R | a3 | imm | n3u |
|    | 010 | 100   |   |    |   |    |     |     |

#### SWI333.bi

| 15 | 14 9      | 8 6 | 5 3 | 2 0   |
|----|-----------|-----|-----|-------|
| 1  | SWI333.bi | Rt3 | Ra3 | imm3u |
|    | 010101    |     |     |       |

 Syntax:
 SWI333
 Rt3, [Ra3 + (imm3u << 2)]</td>

 SWI333.bi
 Rt3, [Ra3], (imm3u << 2)</td>

 32-bit Equivalent:
 SWI
 3T5(Rt3), [3T5(Ra3) + ZE(imm3u << 2)]</td>

SWI.bi 3T5(Rt3), [3T5(Ra3)], ZE(imm3u << 2)

(imm3u is a word offset. In assembly programming, always write a byte offset.)

**Purpose:** It is used to store a 32-bit word from a general register into memory.

**Description:** This instruction stores a word from the general register Rt3 into the memory. Two different forms are used to specify the memory address. The regular form uses Ra3 +  $ZE(imm3u \ll 2)$  as its memory address while the .bi form uses Ra3. For the .bi form, the Ra3 register will be updated with the Ra3 +  $ZE(imm3u \ll 2)$  value after the memory store operation.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
Addr = Ra3 + Zero_Extend((imm3u << 2));
If (.bi form) {
    Vaddr = Ra3;
} else {
    Vaddr = Addr;
}</pre>
```



```
if (!Word_Aligned(Vaddr)) {
   Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, WORD, Attributes, Rt3);
   If (.bi form) { Ra3 = Addr; }
} else {
   Generate_Exception(Excep_status);
}
```



### SWI37 (Store Word Immediate with Implied FP)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14 11 | 10 8 | 7     | 6 (   | ) |
|----|-------|------|-------|-------|---|
| 1  | XWI37 | Rt3  | SWI37 | imm7u |   |
|    | 0111  |      | 1     |       |   |

**Syntax:** SWI37 Rt3, [FP + (imm7u << 2)]

```
32-bit Equivalent: SWI 3T5(Rt3), [FP + ZE(imm7u << 2)]
```

(imm7u is a word offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to store a 32-bit word from a general register into memory. **Description:** This instruction stores a word from the general register Rt3 into the memory. The memory address is specified by adding the implied FP (i.e. R28) register with the zero-extended (imm7u << 2) value. Notice that the imm7u is a word-aligned offset. The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = FP (i.e. R28) + Zero_Extend(imm7u << 2);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store_Memory(PAddr, WORD, Attributes, Rt3);
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All

Note:





### SWI450 (Store Word Immediate)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14     | 9   | 8 | 5  | 4   | 0 |
|----|--------|-----|---|----|-----|---|
| 1  | SWI450 |     | R | t4 | Ra5 |   |
|    | 011    | 011 |   |    |     |   |

**Syntax:** SWI450 Rt4, [Ra5]

**32-bit Equivalent:** SWI 4T5(Rt4), [Ra5 + 0]

Purpose: It is used to store a 32-bit word from a general register into memory.

**Description:** This instruction stores a word from the general register Rt4 into the memory. The memory address is specified in Ra5.

The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = Ra5;
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store_Memory(PAddr, WORD, Attributes, Rt4);
} else {
    Generate_Exception(Excep_status);
}
```



### X11B33 (Extract the Least 11 Bits)

**Type:** 16-Bit Extension **Format:** 

| 15 | 14      | 9      | 8 | 6   | 5 | 3   | 2  | 0      |  |
|----|---------|--------|---|-----|---|-----|----|--------|--|
| 1  | BFMI333 |        | R | Rt3 |   | Ra3 |    | X11B33 |  |
|    | 0010    | 001011 |   |     |   |     | 10 | )1     |  |

Syntax: X11B33 Rt3, Ra3

32-bit Equivalent: ANDI 3T5(Rt3), 3T5(Ra3), 0x7FF

**Purpose:** It is used to extract the least-significant 11 bits of Ra3 and the result is written to Rt3.

**Description:** The least-significant 11 bits of Ra3 is extracted. And the result is written to the register Rt3.

#### **Operations:**

Rt3 = Ra3 & 0x7FF;



### XLSB33 (Extract LSB)

**Type:** 16-Bit Extension **Format:** 

| 15 | 14      | 9   | 8 | 6   | 5 | 3   | 2  | 0      |  |
|----|---------|-----|---|-----|---|-----|----|--------|--|
| 1  | BFMI333 |     | R | Rt3 |   | Ra3 |    | XLSB33 |  |
|    | 0010    | )11 |   |     |   |     | 10 | 00     |  |

Syntax: XLSB33 Rt3, Ra3

**32-bit Equivalent:** ANDI 3T5(Rt3), 3T5(Ra3), 0x1

**Purpose:** It is used to extract the least-significant bit of Ra3 and the result is written to Rt3. **Description:** The least-significant bit of Ra3 is extracted. And the result is written to the register Rt3.

#### **Operations:**

Rt3 = Ra3 & 0x1;



## ZEB33 (Zero Extend Byte)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14      | 9      | 8 | 6   | 5 | 3   | 2  | 0     |  |
|----|---------|--------|---|-----|---|-----|----|-------|--|
| 1  | BFMI333 |        | R | Rt3 |   | Ra3 |    | ZEB33 |  |
|    | 0010    | 001011 |   |     |   |     | 00 | 00    |  |

Syntax: ZEB33 Rt3, Ra3

**32-bit Equivalent:** ANDI 3T5(Rt3), 3T5(Ra3), 0xFF

**Purpose:** It is used to zero-extend the least-significant byte of Ra3 and the result is written to Rt3.

**Description:** The least-significant byte of Ra3 is zero-extended to the width of a general register. And the result is written to the register Rt3.

#### **Operations:**

```
Rt3 = Zero_Extend(Ra3(7,0));
```



### ZEH33 (Zero Extend Halfword)

**Type:** 16-Bit Baseline **Format:** 

| 15 | 14   | 9    | 8 | 6   | 5  | 3  | 2   | 0   |
|----|------|------|---|-----|----|----|-----|-----|
| 1  | BFM  | [333 | R | .t3 | Ra | 13 | ZEF | ł33 |
|    | 0010 | )11  |   |     |    |    | 00  | 1   |

Syntax: ZEH33 Rt3, Ra3

**32-bit Equivalent:** ZEH 3T5(Rt3), 3T5(Ra3)

**Purpose:** It is used to zero-extend the least-significant halfword of Ra3 and the result is written to Rt3.

**Description:** The least-significant halfword of Ra3 is zero-extended to the width of a general register. And the result is written to the register Rt3.

#### **Operations:**

```
Rt3 = Zero_Extend(Ra3(15,0));
```



# 7.6 16-bit and 32-bit Baseline Version 2 instructions

These Baseline Version 2 instructions are added to improve code density.



### **ADDI10S (Add Immediate with Implied Stack Pointer)**

**Type:** 16-Bit Baseline Version 2 **Format:** 

|    | 1   | ADDI10S |   |        |   |
|----|-----|---------|---|--------|---|
| 15 | 14  | 10      | 9 |        | 0 |
| 1  | ADD | OI10S   |   | imm10s |   |
|    | 110 | )11     |   |        |   |

Syntax: ADDI10.sp imm10s

**32-bit Equivalent:** ADDI r31, r31, SE(imm10s)

**Purpose:** It is used to add a sign-extended immediate into the content of the stack pointer register (R31).

**Description:** The sign-extended 10-bit immediate "imm10s" is added to the content of R31 (stack pointer). And the result is written back to R31.

#### **Operations:**

R31 = R31 + SE(imm10s);



## LWI37SP (Load Word Immediate with Implied SP)

**Type:** 16-Bit Baseline Version 2 **Format:** 

**Syntax:** LWI37.sp Rt3, [+ (imm7u << 2)]

```
32-bit Equivalent: LWI 3T5(Rt3), [SP + ZE(imm7u << 2)]
```

(imm7u is a word offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to load a 32-bit word from memory into a general register. **Description:** This instruction loads a word from the memory into the general register Rt3. The memory address is specified by adding the implied SP (i.e. R31) register with the zero-extended (imm7u << 2) value. Notice that the imm7u is a word-aligned offset. The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = SP (i.e. R31) + Zero_Extend(imm7u << 2);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
    Rt3 = Wdata(31,0);
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check.

**Detail Instruction Description** 



Privilege level: All Note:



### SWI37SP (Store Word Immediate with Implied SP)

**Type:** 16-Bit Baseline Version 2 **Format:** 

 15
 14
 11
 10
 8
 7
 6
 0

 1
 XWI37SP
 Rt3
 SWI37SP
 imm7u

 1110
 1
 1
 1

**Syntax:** SWI37.sp Rt3, [+ (imm7u << 2)]

```
32-bit Equivalent: SWI 3T5(Rt3), [SP + ZE(imm7u << 2)]
```

(imm7u is a word offset. In assembly programming, always write a byte offset.) **Purpose:** It is used to store a 32-bit word from a general register into memory. **Description:** This instruction stores a word from the general register Rt3 into the memory. The memory address is specified by adding the implied SP (i.e. R31) register with the zero-extended (imm7u << 2) value. Notice that the imm7u is a word-aligned offset. The memory address has to be word-aligned. Otherwise, a Data Alignment Check exception will be generated.

#### **Operations:**

```
VAddr = SP (i.e. R31) + Zero_Extend(imm7u << 2);
if (!Word_Aligned(Vaddr)) {
    Generate_Exception(Data_alignment_check);
}
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
    Store_Memory(PAddr, WORD, Attributes, Rt3);
} else {
    Generate_Exception(Excep_status);
}
```

**Exceptions:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check. **Privilege level:** All

Note:





## ADDI.gp (GP-implied Add Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30   | 25 | 24 | 20 | 19   | 18 | 0      |  |
|----|------|----|----|----|------|----|--------|--|
| 0  | SBG  | θP | R  | Rt | 1    |    | imm19s |  |
|    | 0111 | 11 |    |    | ADDI |    |        |  |

Syntax: ADDI.gp Rt, imm19s

**Purpose:** Add the content of the implied GP (R29) register with a signed constant. **Description:** The content of Gp (R29) is added with the sign-extended imm19s. And the result is written to Rt. The imm19s value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

Rt = R29 + SE(imm19s);



### **DIVR (Unsigned Integer Divide to Registers)**

**Type:** 32-Bit Baseline Version 2 **Format:** 

| 31 | 30     | 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 5 | 4  | 0   |
|----|--------|-----|----|----|----|----|----|----|---|---|----|-----|
| 0  | AL     | U_1 | F  | Rt | R  | la | R  | b  | R | S | DI | VR  |
|    | 100000 |     |    |    |    |    |    |    |   |   | 10 | 111 |

Syntax: DIVR Rt, Rs, Ra, Rb

**Purpose:** Divide the unsigned integer content of one register with the unsigned integer content of another register.

**Description:** Divide the 32-bit content of Ra with the 32-bit content of Rb. The 32-bit quotient result is written to Rt. And the 32-bit remainder result is written to Rs. The contents of Ra and Rb are treated as unsigned integers.

If the content of Rb is zero, an Arithmetic exception will be generated if the IDIVZE bit of the INT\_MASK register is 1, which enables exception generation for the "Divide-By-Zero" condition.

#### **Operations:**

```
If (Rb != 0) {
   quotient = Floor(CONCAT(1`b0,Ra) / CONCAT(1`b0,Rb));
   remainder = CONCAT(1`b0,Ra) mod CONCAT(1`b0,Rb);
   Rt = quotient;
   Rs = remainder;
} else if (INT_MASK.IDIVZE == 0) {
   Rt = 0;
   Rs = 0;
} else {
   Generate_Exception(Arithmetic);
}
```

**Exceptions:** Arithmetic **Privilege level:** All **Note:** 





## **DIVSR (Signed Integer Divide to Registers)**

**Type:** 32-Bit Baseline Optional **Format:** 

| 31 | 30     | 25    | 24 | 20 | 19 | 15 | 14 | 10 | 9 | 5  | 4   | 0           |
|----|--------|-------|----|----|----|----|----|----|---|----|-----|-------------|
| 0  | AL     | ALU_1 |    | Rt | Ra |    | R  | Rb |   | ls | DIV | <b>V</b> SR |
|    | 100000 |       |    |    |    |    |    |    |   |    | 10  | 110         |

Syntax: DIVSR Rt, Rs, Ra, Rb

**Purpose:** Divide the signed integer content of one register with the signed integer content of another register.

**Description:** Divide the 32-bit content of Ra with the 32-bit content of Rb. The 32-bit quotient result is written to Rt. And the 32-bit remainder result is written to Rs. The contents of Ra and Rb are treated as signed integers.

If the content of Rb is zero, an Arithmetic exception will be generated if the IDIVZE bit of the INT\_MASK register is 1, which enables exception generation for the "Divide-By-Zero" condition. If the quotient overflows, an Arithmetic exception will always be generated. The overflow condition is as follows:

Positive quotient > 0x7FFF FFFF (When Ra = 0x80000000 and Rb = 0xFFFFFFFF)

#### **Operations:**

```
If (Rb != 0) {
   quotient = Floor(Ra / Rb);
   if (IsPositive(quotient) && quotient > 0x7FFFFFF) {
      Generate_Exception(Arithmetic);
   }
   remainder = Ra mod Rb
   Rt = quotient;
   Rs = remainder;
} else if (INT_MASK.IDIVZE == 0) {
   Rt = 0;
   Rs = 0;
} else {
```



```
Generate_Exception(Arithmetic);
}
```

Exceptions: Arithmetic Privilege level: All Note:



## LBI.gp (GP-implied Load Byte Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 25  | 24 20 | 19  | 18 0   |
|----|--------|-------|-----|--------|
| 0  | LBGP   | Rt    | 0   | imm19s |
|    | 010111 |       | LBI |        |

Syntax: LBI.gp Rt, [+ imm19s]

**Purpose:** To load a zero-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a zero-extended byte from the memory into the general register Rt. The memory address is specified by the implied GP register (R29) plus a sign-extended imm19s value. The imm19s value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm19s);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Bdata(7,0) = Load_Memory(PAddr, BYTE, Attributes);
    Rt = Zero_Extend(Bdata(7,0));
} else {
    Generate_Exception(Excep_status);
}
```



# LBSI.gp (GP-implied Load Byte Signed Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30    | 25 | 24 | 20 | 19   | 18 | 0      |  |
|----|-------|----|----|----|------|----|--------|--|
| 0  | LBGI  | P  | R  | .t | 1    |    | imm19s |  |
|    | 01011 | 1  |    |    | LBSI |    |        |  |

**Syntax:** LBSI.gp Rt, [+ imm19s]

**Purpose:** To load a sign-extended 8-bit byte from memory into a general register. **Description:** This instruction loads a sign-extended byte from the memory into the general register Rt. The memory address is specified by the implied GP register (R29) plus a sign-extended imm19s value. The imm19s value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm19s);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Bdata(7,0) = Load_Memory(PAddr, BYTE, Attributes);
    Rt = Sign_Extend(Bdata(7,0));
} else {
    Generate_Exception(Excep_status);
}
```



### LBUP (Load Byte with User Privilege Translation)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |
|----|--------|-------|-------|-------|----|----------|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | LBUP     |
|    | 011100 |       |       |       |    | 00100000 |

**Syntax:** LBUP Rt,  $[Ra + (Rb \ll sv)]$ 

**Purpose:** To load a zero-extended 8-bit byte from memory into a general register with the user mode privilege address translation.

**Description:** This instruction loads a zero-extended byte from the memory address Ra + (Rb << sv) into the general register Rt with the user mode privilege address translation regardless of the current processor operation mode (i.e. PSW.POM) and the current data address translation state (i.e. PSW.DT).

#### **Operations:**

```
Vaddr = Ra + (Rb << sv);
(PAddr, Attributes) = Address_Translation(Vaddr, TRANSLATE);
Excep_status = Page_Exception(Attributes, USER_MODE, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Bdata(7,0) = Load_Memory(PAddr, Byte, Attributes);
    Rt = Zero_Extend(Bdata(7,0));
} else {
    Generate_Exception(Excep_status);
}
```



# LHI.gp (GP-implied Load Halfword Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 25  | 24 20 | 19 18 | 17 0   |
|----|--------|-------|-------|--------|
| 0  | HWGP   | Rt    | 00    | imm18s |
|    | 011110 |       | LHI   |        |

**Syntax:** LHI.gp Rt, [+ (imm18s << 1)]

(imm18s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To load a zero-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a zero-extended halfword from the memory into the general register Rt. The memory address is specified by the implied GP register (R29) plus a sign-extended (imm18s << 1) value. The (imm18s << 1) value will cover a range of 512K byte region relative to the location pointed to by the GP register.

### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm18s << 1);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
    Rt = Zero_Extend(Hdata(15,0));
} else {
    Generate_Exception(Excep_status);
}
```



# LHSI.gp (GP-implied Load Signed Halfword Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 2   | 25 | 24 | 20 | 19 | 18  | 17     | 0 |
|----|--------|----|----|----|----|-----|--------|---|
| 0  | HWGP   |    | Rt | t  | 0  | 1   | imm18s |   |
|    | 011110 |    |    |    | LH | ISI |        |   |

**Syntax:** LHSI.gp Rt, [+ (imm18s << 1)]

(imm18s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To load a sign-extended 16-bit halfword from memory into a general register. **Description:** This instruction loads a sign-extended halfword from the memory into the general register Rt. The memory address is specified by the implied GP register (R29) plus a sign-extended (imm18s << 1) value. The (imm18s << 1) value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm18s << 1);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Hdata(15,0) = Load_Memory(PAddr, HALFWORD, Attributes);
    Rt = Sign_Extend(Hdata(15,0));
} else {
    Generate_Exception(Excep_status);
}
```



### LMWA (Load Multiple Word with Alignment Check)

**Type:** 32-bit Baseline Version 2 **Format:** 

|    | LMWA |     |    |    |    |    |       |     |      |     |     |     |   |    |
|----|------|-----|----|----|----|----|-------|-----|------|-----|-----|-----|---|----|
| 31 | 30   | 25  | 24 | 20 | 19 | 15 | 14 10 | 9   | 6    | 5   | 4   | 3   | 2 | 10 |
| 0  | LSN  | ЛW  | R  | b  | R  | a  | Re    | Ena | ble4 | LMW | b:0 | i:0 | m | 01 |
|    | 011  | 101 |    |    |    |    |       |     |      | 0   | a:1 | d:1 |   |    |

**Syntax:** LMWA.  $\{b | a\} \{i | d\} \{m?\}$  Rb, [Ra], Re, Enable4

**Purpose:** Load multiple 32-bit words from sequential memory locations into multiple registers.

**Description:** load multiple 32-bit words from sequential memory addresses specified by the base address register Ra and the {b | a} {i | d} options into a continuous range or a subset of general-purpose registers specified by a registers list formed by Rb, Re, and the four-bit Enable4 field as follows.

<Registers List> = a range from [Rb, Re] and a list from <Enable4>

- {b | a} option specifies the way how the first address is generated. {b} use the contents of Ra as the first memory load address. {a} use either Ra+4 or Ra-4 for the {i | d} option respectively as the first memory load address.
- {i | d} option specifies the direction of the address change. {i} generates increasing addresses from Ra and {d} generates decreasing addresses from Ra.
- {m?} option, if it is specified, indicates that the base address register will be updated to the value computed in the following formula at the completion of this instruction.

TNReg = Total number of registers loaded Updated value = Ra + (4 \* TNReg) for {i} option Updated value = Ra - (4 \* TNReg) for {d} option

 [Rb, Re] specifies a range of registers which will be loaded by this instruction. Rb(4,0) specifies the first register number in the continuous register range and Re(4,0) specifies the last register number in this register range. In addition to the range of registers, <Enable4(3,0)> specifies the load of 4 individual registers from R28 to R31 (s9/fp, gp, lp, sp) which have special calling convention usage. The exact mapping of Enable4(3,0) bits and registers is as follows:



| Bits      | Enable4(3) | Enable4(2) | Enable4(1) | Enable4(0) |  |
|-----------|------------|------------|------------|------------|--|
|           | Format(9)  | Format(8)  | Format(7)  | Format(6)  |  |
| Registers | R28        | R29        | R30        | R31        |  |

- Several constraints are imposed for the <Registers List>:
  - If [Rb(4,0), Re(4,0)] specifies at least one register:
    - $Rb(4,0) \le Re(4,0) AND$
    - $0 \le Rb(4,0), Re(4,0) \le 28$
  - If [Rb(4,0), Re(4,0)] specifies no register at all:
    - Rb(4,0) == Re(4,0) = 0b11111 AND
    - Enable4(3,0) != 0b0000
  - If these constraints are not met, UNPREDICTABLE result will happen to the contents of all registers after this instruction.
- The registers are loaded in sequence from matching memory locations with one exception. That is, the lowest-numbered register is loaded from the lowest memory address while the highest-numbered register is loaded from the highest memory address with the following exception.
  - The matching memory locations of R28 (fp) and R31 (sp) are swapped. That is, the memory locations for R28-R31 are as follows.

| R28 | High memory location |
|-----|----------------------|
| R30 |                      |
| R29 |                      |
| R31 | Low memory location  |

Note that the load sequence of this instruction involving R28 and R31 is different from the load/store sequence of LMW/SMW.

- If the base address register update {m?} option is specified while the base address register Ra is also specified in the <Register Specification>, there are two source values for the final content of the base address register Ra. In this case, the final value of Ra is UNPREDICTABLE. And the rest of the loaded registers should have values as if the base address register update {m?} option is not specified.
- This instruction can only handle word-aligned memory address.



#### **Operation:**

```
TNReg = Count_Registers(register_list);
if ("bi") {
  B addr = Ra;
  E_addr = Ra + (TNReg * 4) - 4;
} elseif ("ai") {
  B addr = Ra + 4;
  E_addr = Ra + (TNReg * 4);
} elseif ("bd") {
  B addr = Ra - (TNReg * 4) + 4;
  E addr = Ra;
} else { // "ad"
  B_addr = Ra - (TNReg * 4);
  E addr = Ra - 4
}
VA = B_addr;
if (!word-aligned(VA)) {
   Generate Exception (Data Alignment Check);
}
for (i = 0 to 27, 31, 29, 30, 28) {
   if (register list[i] == 1) {
      (PA, Attributes) = Address Translation(VA, PSW.DT);
      Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
      If (Excep_status == NO_EXCEPTION) {
         Ri = Load Memory(PA, Word, Attributes);
         VA = VA + 4;
      } else {
         Generate Exception (Excep status);
      }
   }
}
if ("im") {
  Ra = Ra + (TNReg * 4);
} else { // "dm"
  Ra = Ra - (TNReg * 4);
}
```



**Exception:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check.

• If the base register update is not specified, the base register value is unchanged. This applies even if the instruction loaded its own base register and the memory access to load the base register occurred earlier than the exception event. For example, suppose the instruction is

LMW.bd R2, [R4], R4, 0b0000

And the implementation loads R4, then R3, and finally R2. If an exception occurs on any of the accesses, the value in the base register R4 of the instruction is unchanged.

- If the base register update is specified, the value left in the base register is unchanged.
- If the instruction loads only one general-purpose register, the value in that register is unchanged.
- If the instruction loads more than one general-purpose register, UNPREDICTABLE values are left in destination registers which are not the base register of the instruction.

Interruption: Whether this instruction is interruptible or not is

implementation-dependent.

### Privilege Level: all

### Note:

- (5) LMW and SMW instructions do not guarantee atomicity among individual memory access operations. And they do not guarantee single access to a memory location during the execution either. Any I/O access that has side-effects other than simple stable memory-like access behavior should not use these two instructions.
- (6) The memory access order among the words accessed by LMW/SMW is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i : increasing memory addresses from base address.
  - For LMW/SMW.d: decreasing memory addresses from base address.
- (7) The memory access order within an un-aligned word accessed is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i: the aligned low address of the word and then the aligned high address of the word. If an interruption occurs, the EVA register will contain the starting low address of the un-aligned word or .



- For LMW/SMW.d: the aligned high address of the word and then the aligned low address of the word. If an interruption occurs, the EVA register will contain "base un-aligned address + 4" of the first word or the starting low address of the remaining decreasing memory word.
- (8) Based on the more likely access order of (2) and (3), upon interruption, the EVA register for un-aligned LMW/SMW will more likely have the following value:
  - For LMW/SMW.i: the starting low addresses of the accessed words or "Ra + (TNReg \* 4)" where TNReg represents the total number of registers loaded or stored.
  - For LMW/SMW.d: the starting low addresses of the accessed words or "Ra + 4".



### LWI.gp (GP-implied Load Word Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30     | 25 | 24    | 20 | 19 | 17 | 16     | 0 |
|----|--------|----|-------|----|----|----|--------|---|
| 0  | HWGP   |    | GP Rt |    | 11 | 0  | imm17s |   |
|    | 011110 |    |       |    | LV | VI |        |   |

**Syntax:** LWI.gp Rt, [+ (imm17s << 2)]

(imm17s is a word offset. In assembly programming, always write a byte offset.) **Purpose:** To load a 32-bit word from memory into a general register. **Description:** This instruction loads a 32-bit word from the memory into the general register Rt. The memory address is specified by the implied GP register (R29) plus a sign-extended (imm17s << 2) value. The (imm17s << 2) value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm17s << 2);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, LOAD);
If (Excep_status == NO_EXCEPTION) {
    Wdata(31,0) = Load_Memory(PAddr, WORD, Attributes);
    Rt = Wdata(31,0);
} else {
    Generate_Exception(Excep_status);
}
```



### MADDR32 (Multiply and Add to 32-bit Register)

**Type:** 32-Bit Baseline Version 2 **Format:** 

| 31 | 30 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9  | 6    | 5   | 0      |  |
|----|--------|----|----|----|----|----|----|----|------|-----|--------|--|
| 0  | ALU_2  | I  | Rt |    | Ra |    | Rb |    | 0001 |     | MADD32 |  |
|    | 100001 |    |    |    |    |    |    | GI | PR   | 110 | 011    |  |

Syntax: MADDR32 Rt, Ra, Rb

**Purpose:** Multiply the contents of two 32-bit registers and add the lower 32-bit multiplication result with the 32-bit content of a destination register. The final result is written back to the destination register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit multiplication result is added with the content of Rt. And the final result is written back to Rt. The contents of Ra and Rb can be either signed or unsigned integers.

#### **Operations:**

Mresult = Ra \* Rb; Rt = Rt + Mresult(31,0);



### MSUBR32 (Multiply and Subtract from 32-bit Register)

**Type:** 32-Bit Baseline Version 2 **Format:** 

| 31 | 30     | 25 | 24 | 20 | 19 | 15 | 14 | 10 | 9    | 6  | 5      | 0   |
|----|--------|----|----|----|----|----|----|----|------|----|--------|-----|
| 0  | ALU_2  |    | Rt |    | Ra |    | Rb |    | 0001 |    | MSUB32 |     |
|    | 100001 |    |    |    |    |    |    |    | GF   | 'R | 110    | 101 |

Syntax: MSUBR32 Rt, Ra, Rb

**Purpose:** Multiply the contents of two 32-bit registers and subtract the lower 32-bit multiplication result from the 32-bit content of a destination register. The final result is written back to the destination register.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The lower 32-bit multiplication result is subtracted from the content of Rt. And the final result is written back to Rt. The contents of Ra and Rb can be either signed or unsigned integers.

#### **Operations:**

Mresult = Ra \* Rb; Rt = Rt - Mresult(31,0);


## MULR64 (Multiply Word Unsigned to Registers)

**Type:** 32-Bit Baseline Version 2 **Format:** 

| 31 | 30 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9  | 6    | 5   | 0      |  |
|----|--------|----|----|----|----|----|----|----|------|-----|--------|--|
| 0  | ALU_2  |    | Rt |    | Ra |    | Rb |    | 0001 |     | MULT64 |  |
|    | 100001 |    |    |    |    |    |    | GI | PR   | 101 | 001    |  |

Syntax: MULR64 Rt, Ra, Rb

**Purpose:** Multiply the unsigned integer contents of two 32-bit registers and write the 64-bit result to an even/odd pair of 32-bit registers.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is written to an even/odd pair of registers containing Rt. Rt(4,1) index d determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. How the register pair contains the 64-bit result depends on the current data endian. When the data endian is *big*, the even register of the pair contains the high 32-bit of the result and the odd register of the pair contains the low 32-bit of the result. When the data endian is *little*, the odd register of the pair contains the high 32-bit of the result and the pair contains the low 32-bit of the result.

The contents of Ra and Rb are treated as unsigned integers.

#### **Operations:**

```
Mresult = CONCAT(1`b0,Ra) * CONCAT(1`b0,Rb);
If (PSW.BE == 1) {
    R[Rt(4,1).0(0)](31,0) = Mresult(63,32);
    R[Rt(4,1).1(0)](31,0) = Mresult(31,0);
} else {
    R[Rt(4,1).1(0)](31,0) = Mresult(63,32);
    R[Rt(4,1).0(0)](31,0) = Mresult(31,0);
}
```

Exceptions: None Privilege level: All Note:



## MULSR64 (Multiply Word Signed to Registers)

**Type:** 32-Bit Baseline Version 2 **Format:** 

| 31 | 30 25  | 24 | 20 | 19 | 15 | 14 | 10 | 9  | 6    | 5   | 0       |  |
|----|--------|----|----|----|----|----|----|----|------|-----|---------|--|
| 0  | ALU_2  | Rt | Rt |    | Ra |    | Rb |    | 0001 |     | MULTS64 |  |
|    | 100001 |    |    |    |    |    |    | GF | PR   | 101 | 000     |  |

Syntax: MULSR64 Rt, Ra, Rb

**Purpose:** Multiply the signed integer contents of two 32-bit registers and write the 64-bit result to an even/odd pair of 32-bit registers.

**Description:** Multiply the 32-bit content of Ra with the 32-bit content of Rb. The 64-bit multiplication result is written to an even/odd pair of registers containing Rt. Rt(4,1) index d determines the even/odd pair group of the two registers. Specifically, the register pair includes register 2d and 2d+1. How the register pair contains the 64-bit result depends on the current data endian. When the data endian is *big*, the even register of the pair contains the high 32-bit of the result and the odd register of the pair contains the low 32-bit of the result. When the data endian is *little*, the odd register of the pair contains the high 32-bit of the result and the odd register of the pair contains the high 32-bit of the result and the odd register of the pair contains the high 32-bit of the result and the odd register of the pair contains the high 32-bit of the result and the odd register of the pair contains the high 32-bit of the result.

The contents of Ra and Rb are treated as signed integers.

#### **Operations:**

```
Mresult = Ra * Rb;
If (PSW.BE == 1) {
    R[Rt(4,1).0(0)](31,0) = Mresult(63,32);
    R[Rt(4,1).1(0)](31,0) = Mresult(31,0);
} else {
    R[Rt(4,1).1(0)](31,0) = Mresult(63,32);
    R[Rt(4,1).0(0)](31,0) = Mresult(31,0);
}
```

Exceptions: None Privilege level: All Note:





## **SBI.gp (GP-implied Store Byte Immediate)**

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 2   | 25 | 24 | 20 | 19  | 18 | 0      |
|----|--------|----|----|----|-----|----|--------|
| 0  | SBGP   |    | R  | t  | 0   |    | imm19s |
|    | 011111 |    |    |    | SBI |    |        |

Syntax: SBI.gp Rt, [+ imm19s]

**Purpose:** To store an 8-bit byte from a general register into a memory location. **Description:** The least-significant 8-bit byte in the general register Rt is stored to the memory location. The memory address is specified by the implied GP register (R29) plus a sign-extended imm19s value. The imm19s value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm19s);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, BYTE, Attributes, Rt(7,0));
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



## **SBUP (Store Byte with User Privilege Translation)**

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 25  | 24 20 | 19 15 | 14 10 | 98 | 7 0      |  |
|----|--------|-------|-------|-------|----|----------|--|
| 0  | MEM    | Rt    | Ra    | Rb    | SV | SBUP     |  |
|    | 011100 |       |       |       |    | 00101000 |  |

**Syntax:** SB Rt, [Ra + (Rb << sv)]

**Purpose:** To store an 8-bit byte from a general register into memory with the user mode privilege address translation.

**Description:** The least-significant 8-bit byte in the general register Rt is stored to the memory  $Ra + (Rb \ll sv)$  with the user mode privilege address translation regardless of the current processor operation mode (i.e. PSW.POM) and the current data address translation state (i.e. PSW.DT).

#### **Operations:**

```
Vaddr = Ra + (Rb << sv);
(PAddr, Attributes) = Address_Translation(Vaddr, TRANSLATE);
Excep_status = Page_Exception(Attributes, USER_MODE, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, BYTE, Attributes, Rt(7,0));
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error Privilege level: All Note:



## SHI.gp (GP-implied Store Halfword Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30 25  | 24 20 | 19 18 | 17 0   |
|----|--------|-------|-------|--------|
| 0  | HWGP   | Rt    | 10    | imm18s |
|    | 011110 |       | SHI   |        |

**Syntax:** SHI.gp Rt, [+ (imm18s << 1)]

(imm18s is a halfword offset. In assembly programming, always write a byte offset.) **Purpose:** To store a 16-bit halfword from a general register into a memory location. **Description:** The least-significant 16-bit halfword in the general register Rt is stored to the memory location. The memory address is specified by the implied GP register (R29) plus a sign-extended (imm18s << 1) value. The (imm18s << 1) value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm18s << 1);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, HALFWORD, Attributes, Rt(15,0));
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



## SMWA (Store Multiple Word with Alignment Check)

**Type:** 32-bit Baseline Version 2 **Format:** 

|    |     |     |    |    |    |    | SIMWA |     |      |     |     |     |   |    |
|----|-----|-----|----|----|----|----|-------|-----|------|-----|-----|-----|---|----|
| 31 | 30  | 25  | 24 | 20 | 19 | 15 | 14 10 | 9   | 6    | 5   | 4   | 3   | 2 | 10 |
| 0  | LSN | МW  | R  | b  | R  | a  | Re    | Ena | ble4 | SMW | b:0 | i:0 | m | 01 |
|    | 011 | 101 |    |    |    |    |       |     |      | 1   | a:1 | d:1 |   |    |

CD CTTT

**Syntax:** SMWA.  $\{b | a\} \{i | d\} \{m?\}$  Rb, [Ra], Re, Enable4

**Purpose:** Store multiple 32-bit words from multiple registers into sequential memory locations.

- Description: Store multiple 32-bit words from a range or a subset of source general-purpose registers to sequential memory addresses specified by the base address register Ra and the {b | a} {i | d} options. The source registers are specified by a registers list formed by Rb, Re, and the four-bit Enable4 field as follows. <Registers List> = a range from {Rb, Re} and a list from <Enable4>
  - {b | a} option specifies the way how the first address is generated. {b} use the contents of Ra as the first memory store address. {a} use either Ra+4 or Ra-4 for the {i | d} option respectively as the first memory store address.
  - {i | d} option specifies the direction of the address change. {i} generates increasing addresses from Ra and {d} generates decreasing addresses from Ra.
  - {m?} option, if it is specified, indicates that the base address register will be updated to the value computed in the following formula at the completion of this instruction.

TNReg = Total number of registers stored Updated value = Ra + (4 \* TNReg) for {i} option Updated value = Ra - (4 \* TNReg) for {d} option

• [Rb, Re] specifies a range of registers whose contents will be stored by this instruction. Rb(4,0) specifies the first register number in the continuous register range and Re(4,0) specifies the last register number in this register range. In addition to the range of registers, <Enable4(3,0)> specifies the store of 4 individual registers from R28 to R31 (s9/fp, gp, lp, sp) which have special calling convention usage. The exact mapping of Enable4(3,0) bits and registers



is as follows:

| Bits      | Enable4(3) | Enable4(2) | Enable4(1) | Enable4(0) |
|-----------|------------|------------|------------|------------|
|           | Format(9)  | Format(8)  | Format(7)  | Format(6)  |
| Registers | R28        | R29        | R30        | R31        |

- Several constraints are imposed for the <Registers List>:
  - If [Rb, Re] specifies at least one register:

    - $0 \le Rb(4,0), Re(4,0) \le 28$
  - If [Rb, Re] specifies no register at all:
    - Rb(4,0) == Re(4,0) = 0b11111 AND
    - Enable4(3,0) != 0b0000
  - If these constraints are not met, UNPREDICTABLE result will happen to the contents of the memory range pointed to by the base register and the base register itself if the {m?} option is specified after this instruction.
- The register is stored in sequence to matching memory locations with one exception. That is, the lowest-numbered register is stored to the lowest memory address while the highest-numbered register is stored to the highest memory address with the following exception.
  - The matching memory locations of R28 (fp) and R31 (sp) are swapped. That is, the memory locations for R28-R31 are as follows.

| R28 | High memory location |
|-----|----------------------|
| R30 |                      |
| R29 |                      |
| R31 | Low memory location  |

Note that the load sequence of this instruction involving R28 and R31 is different from the load/store sequence of LMW/SMW.

- If the base address register Ra is specified in the <Registers Specification>, the value stored to the memory from the register Ra is the Ra value before this instruction is executed.
- This instruction can only handle word-aligned memory address.

#### **Operation:**



```
TNReg = Count Registers(register list);
if ("bi") {
  B addr = Ra;
  E addr = Ra + (TNReg * 4) - 4;
} elseif ("ai") {
  B addr = Ra + 4;
  E addr = Ra + (TNReg * 4);
} elseif ("bd") {
  B_addr = Ra - (TNReg * 4) + 4;
  E addr = Ra;
} else { // "ad"
  B_addr = Ra - (TNReg * 4);
  E_addr = Ra - 4
}
VA = B_addr;
if (!word-aligned(VA)) {
   Generate Exception (Data Alignment Check);
}
for (i = 0 to 27, 31, 29, 30, 28) {
   if (register_list[i] == 1) {
      (PA, Attributes) = Address Translation(VA, PSW.DT);
      Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
      If (Excep_status == NO_EXCEPTION) {
         Store_Memory(PA, Word, Attributes, Ri);
         VA = VA + 4;
      } else {
         Generate Exception (Excep status);
      }
   }
}
if ("im") {
   Ra = Ra + (TNReg * 4);
} else { // "dm"
  Ra = Ra - (TNReg * 4);
}
```



**Exception:** TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error, Data alignment check.

• The base register value is left unchanged on an exception event, no matter whether the base register update is specified or not.

**Interruption:** Whether this instruction is interruptible or not is implementation-dependent.

#### Privilege Level: all

#### Note:

- (5) LMW and SMW instructions do not guarantee atomicity among individual memory access operations. And they do not guarantee single access to a memory location during the execution either. Any I/O access that has side-effects other than simple stable memory-like access behavior should not use these two instructions.
- (6) The memory access order among the words accessed by LMW/SMW is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i : increasing memory addresses from base address.
  - For LMW/SMW.d: decreasing memory addresses from base address.
- (7) The memory access order within an un-aligned word accessed is not defined here and should be implementation-dependent. However, the more likely access order implemented by an implementation is:
  - For LMW/SMW.i: the aligned low address of the word and then the aligned high address of the word. If an interruption occurs, the EVA register will contain the starting low address of the un-aligned word or .
  - For LMW/SMW.d: the aligned high address of the word and then the aligned low address of the word. If an interruption occurs, the EVA register will contain "base un-aligned address + 4" of the first word or the starting low address of the remaining decreasing memory word.
- (8) Based on the more likely access order of (2) and (3), upon interruption, the EVA register for un-aligned LMW/SMW will more likely have the following value:
  - For LMW/SMW.i: the starting low addresses of the accessed words or "Ra + (TNReg \* 4)" where TNReg represents the total number of registers loaded or stored.
  - For LMW/SMW.d: the starting low addresses of the accessed words or "Ra + 4".





## SWI.gp (GP-implied Store Word Immediate)

**Type:** 32-Bit Baseline version 2 **Format:** 

| 31 | 30   | 25 | 24 | 20 | 19 | 17 | 16     | 0 |
|----|------|----|----|----|----|----|--------|---|
| 0  | HWC  | ЗP | R  | t  | 11 | 1  | imm17s |   |
|    | 0111 | 10 |    |    | SV | VI |        |   |

**Syntax:** SWI.gp Rt, [+ (imm17s << 2)]

(imm17s is a word offset. In assembly programming, always write a byte offset.) **Purpose:** To store a 32-bit word from a general register into a memory location. **Description:** The 32-bit word in the general register Rt is stored to the memory location. The memory address is specified by the implied GP register (R29) plus a sign-extended (imm17s << 2) value. The (imm17s << 2) value will cover a range of 512K byte region relative to the location pointed to by the GP register.

#### **Operations:**

```
Vaddr = R29 + Sign_Extend(imm17s << 2);
(PAddr, Attributes) = Address_Translation(Vaddr, PSW.DT);
Excep_status = Page_Exception(Attributes, PSW.POM, STORE);
If (Excep_status == NO_EXCEPTION) {
   Store_Memory(PAddr, WORD, Attributes, Rt(31,0));
} else {
   Generate_Exception(Excep_status);
}
```

Exceptions: TLB fill, Non-leaf PTE not present, Leaf PTE not present, Read protection, Page modified, Access bit, TLB VLPT miss, Machine error. Privilege level: All Note:



# Chapter 8 Instruction Latency for AndesCore Implementations

This chapter lists the instruction latency information for AndesCore families and contains the following sections

- 8.1 N12 family implementation on page 325
- 8.2 N10 family implementation on page 330



## 8.1 N12 Implementation

## 8.1.1 Instruction Latency due to Resource Dependency

This section describes the AndesCore N12 instruction latency between a producer instruction and a corresponding consumer instruction. This information is useful for compiler optimization.

Terminology

- Producer: an instruction that produces a new register state.
- Consumer: an instruction that consumes the new register state produced by a producer.
- Latency: the minimum number of cycles between the completion of a producer and that of a consumer. Assuming a producer and a corresponding consumer cannot complete at the same time, the smallest possible latency is 1.
- Bubble: the minimum number of extra cycles that exceeds the smallest possible latency (i.e. 1) between the completion of a producer and that of a consumer. Thus it is equal to (latency – 1).

Producer/Consumer Instruction Group

Producer/Consumer Instruction Group

| Producer  | Instructions                               | Note                  |
|-----------|--------------------------------------------|-----------------------|
| /Consumer |                                            |                       |
| Group     |                                            |                       |
| ALU       | ADDI, SUBRI, ANDI, ORI, XORI, SLTI, SLTSI, | result in general     |
|           | MOVI, STEHI, ADD, SUB, AND, OR, NOR,       | register R; sources   |
|           | XOR, SLT, SLTS, SVA, SVS, SEB, SHE, ZEB,   | from general register |
|           | ZEH, WSBH, CMOVZ, CMOVN, SLLI, SRLI,       | R                     |
|           | SRAI, ROTRI, SLL, SRL, SRA, ROTR           |                       |
| MUL       | MUL                                        | result in general     |
|           |                                            | register R            |
| M2D       | MULTS64, MULT64, MULT32, MADDS64,          | result in accumulator |
|           | MADD64, MSUBS64, MSUB64, MADD32,           | register D            |



MSUB32

• Unique Producer Instruction Group

| Producer | Instructions                                        | Note                |
|----------|-----------------------------------------------------|---------------------|
| Group    |                                                     |                     |
| LD_D     | LWI[.bi], LHI[.bi], LHSI[.bi], LBI[.bi], LBSI[.bi], | Load instructions   |
|          | LW[.bi], LH[.bi], LHS[.bi], LB[.bi], LBS[.bi],      | which have the data |
|          | LWUP, LLW                                           | register as the     |
|          |                                                     | produced state      |
| LD_A     | LWI.bi, LHI.bi, LHSI.bi, LBI.bi, LBSI.bi, LW.bi,    | Load instructions   |
|          | LH.bi, LHS.bi, LB.bi, LBS.bi                        | which have the base |
|          |                                                     | address register as |
|          |                                                     | the produced state  |
| SCW      | SCW                                                 |                     |
| MISC_E2  | MFUSR, MFSR, JAL, JRAL, BGEZAL,                     |                     |
|          | BLTZAL                                              |                     |
| DIV      | DIV, DIVS                                           |                     |

• Unique Consumer Instruction Group

| Consumer | Instructions                                        | Note                |
|----------|-----------------------------------------------------|---------------------|
| Group    |                                                     |                     |
| ST_D     | SWI[.bi], SHI[.bi], SBI[.bi], SW[.bi], SH[.bi],     | Store instructions  |
|          | SB[.bi], SWUP, SCW                                  | which have the data |
|          |                                                     | register as the     |
|          |                                                     | consumed state      |
| MEM_A    | LWI[.bi], LHI[.bi], LHSI[.bi], LBI[.bi], LBSI[.bi], | LD/ST instructions  |
|          | LW[.bi], LH[.bi], LHS[.bi], LB[.bi], LBS[.bi],      | which have the base |
|          | LWUP, LLW, LMW, SWI[.bi], SHI[.bi], SBI[.bi],       | address register as |
|          | SW[.bi], SH[.bi], SB[.bi], SWUP, SCW, SMW,          | the consumed state  |
|          | DPREF, DPREFI                                       |                     |
| BR       | JR, RET, JRAL, BEQ, BNE, BEQZ, BNEZ,                |                     |
|          | BGEZ, BLTZ, BGTZ, BLEZ, BGEZAL,                     |                     |



|         | BLTZAL                              |  |
|---------|-------------------------------------|--|
| MFUSR   | MFUSR                               |  |
| MISC_E1 | TLBOP TRD, TLBOP TWR, TLBOP RWR,    |  |
|         | TLBOP RWLK, TLBOP UNLK, TLBOP PB,   |  |
|         | TLBOP INV, CCTL, ISYNC, MTSR, MTUSR |  |

In AndesCore N12, most of a producer and corresponding consumer have a latency of 1. Only cases need special attention, or deviate from this general rule of thumb will be described below.

| No                                                                  | No Producer Consumer                                                      |                                                  | Latency    |  |  |
|---------------------------------------------------------------------|---------------------------------------------------------------------------|--------------------------------------------------|------------|--|--|
|                                                                     | Dependency on general register Rx                                         |                                                  |            |  |  |
| 1                                                                   | LD_D                                                                      | ALU, ST_D, BR                                    | 2          |  |  |
| 2                                                                   |                                                                           | MUL, M2D, MEM_A, MISC_E1                         | 3          |  |  |
| 3                                                                   | MUL                                                                       | ALU, ST_D, BR                                    | 2          |  |  |
| 4                                                                   |                                                                           | MUL, M2D, MEM_A, MISC_E1                         | 3          |  |  |
| 5                                                                   | ALU, MISC_E2                                                              | MUL, M2D, MEM_A, MISC_E1                         | 2          |  |  |
| 5b*                                                                 | LD_A                                                                      | MUL, M2D, MEM_A, MISC_E1                         | 2          |  |  |
| 6                                                                   | SCW                                                                       | ALU, ST_D, BR                                    | 3          |  |  |
| 7                                                                   |                                                                           | MUL, M2D, MEM_A, MISC_E1                         | 2          |  |  |
| Dependency on accumulator register Dx or multiplication E1 resource |                                                                           |                                                  |            |  |  |
| 8                                                                   | M2D                                                                       | M2D, MFUSR                                       | 2          |  |  |
| 9                                                                   | DIV                                                                       | All Dx consumer                                  | Variable   |  |  |
|                                                                     |                                                                           |                                                  | (32 –      |  |  |
|                                                                     |                                                                           |                                                  | CLZ(ra)) + |  |  |
|                                                                     |                                                                           |                                                  | 3          |  |  |
|                                                                     | For the following LMW related producers, assumes LMW loading N registers: |                                                  |            |  |  |
|                                                                     | Aligned LMW                                                               |                                                  |            |  |  |
|                                                                     | Dependency on the hig                                                     | shest-numbered register in register list for LMW | V.i or     |  |  |
|                                                                     | Dependency on the lowest-numbered register in register list for LMW.d     |                                                  |            |  |  |
| 10                                                                  | LMW.i or LMW.d                                                            | ALU, ST_D, BR                                    | N+1        |  |  |
| 11                                                                  | base not in list,                                                         | MUL, M2D, MEM_A, MISC_E1                         | N+2        |  |  |
|                                                                     | LMW.i or LMW.d                                                            |                                                  |            |  |  |
|                                                                     | base is the                                                               |                                                  |            |  |  |



|    | dependency reg,                               |                                                   |            |  |
|----|-----------------------------------------------|---------------------------------------------------|------------|--|
|    | LMW.im or LWM.dm                              |                                                   |            |  |
|    |                                               |                                                   |            |  |
| Ι  | Dependency NOT on the                         | highest-numbered register in register list for LN | MW.i or    |  |
|    | Dependency NOT on the                         | e lowest-numbered register in register list for L | MW.d       |  |
| 12 | LMW.i or LMW.d                                | ALU, ST_D, BR                                     | Ν          |  |
| 13 | base not in list,                             | MUL, M2D, MEM_A, MISC_E1                          | N+1        |  |
|    | LMW.im                                        |                                                   |            |  |
|    | Fixed latency despite dependency relationship |                                                   |            |  |
| 14 | LMW.i base in list,                           | ALL                                               | N+3        |  |
|    | but not                                       |                                                   |            |  |
|    | highest-numbered                              |                                                   |            |  |
|    | reg,                                          |                                                   |            |  |
|    | LMW.d base in list,                           |                                                   |            |  |
|    | but not                                       |                                                   |            |  |
|    | lowest-numbered reg                           |                                                   |            |  |
|    | Un-Aligned LMW                                |                                                   |            |  |
| 15 | Rule 9-13                                     | Rule 9-13                                         | (Rule 9-13 |  |
|    |                                               |                                                   | latency)+1 |  |

\* Rule 5b exists when the configuration flag "NDS\_POSTWRITE\_E1\_BYPASS" is turned off. Turning on configuration flag "NDS\_POSTWRITE\_E1\_BYPASS" will make all LD\_A consumers to have a latency of 1.

The following instructions have a fixed latency without considering any resource dependency relationships.

| Instruction                     | Latency                       |
|---------------------------------|-------------------------------|
| RET (correct target prediction) | 3                             |
| DSB                             | 5                             |
| ISB, IRET                       | 10                            |
| DIV, DIVS                       | Variable (3 – 34)             |
|                                 | [i.e. 3+floor(log2(abs(Rb)))] |
| TLBOP Invalidate VA             | 10                            |



| TLBOP Invalidate All            | 98      |
|---------------------------------|---------|
| DPREF/DPREFI (miss dcache)      | 4       |
| Instruction                     | Latency |
| RET (correct target prediction) | 3       |
| DSB                             | 3       |
| ISB, IRET                       | 10      |
| TLBOP Invalidate VA             | 10      |
| TLBOP Invalidate All            | 98      |
| DPREF/DPREFI (miss dcache)      | 4       |

## 8.1.2 Cycle Penalty due to N12 Pipeline Control

## **Mishaps Recovery**

This section describes the AndesCore N12 pipeline execution cycle penalties caused by recovering certain unlucky events.

| Event Type                                                     | Penalty  |
|----------------------------------------------------------------|----------|
|                                                                | (Bubble) |
| un-aligned 32-bit instruction fetch after pipeline start/flush | 1        |
| branch mis-prediction                                          | 5/6      |
| uTLB miss/MTLB hit on small page PTE                           | 4        |
| uTLB miss/MTLB hit on large page PTE                           | 6        |
| uTLB miss/MTLB miss/HPTWK prefetch buffer hit                  | 7        |
| (no large page in use)                                         |          |
| uTLB miss/MTLB miss/HPTWK prefetch buffer hit                  | 9        |
| (with large page in use)                                       |          |



# 8.2 N10 Implementation

## 8.2.1 Dependency-related Instruction Latency

This section describes the AndesCore N10 instruction latency between a producer instruction and a corresponding consumer instruction. This information is useful for compiler optimization.

Terminology

- Producer: an instruction that produces a new register state.
- Consumer: an instruction that consumes the new register state produced by a producer.
- Latency: the minimum number of cycles between the completion of a producer and that of a consumer. Assuming a producer and a corresponding consumer cannot complete at the same time, the smallest possible latency is 1.
- Bubble: the minimum number of extra cycles that exceeds the smallest possible latency (i.e. 1) between the completion of a producer and that of a consumer. Thus it is equal to (latency – 1).

Producer/Consumer Instruction Groups

• Producer/Consumer Instruction Group

| Producer  | Instructions                               | Note                  |
|-----------|--------------------------------------------|-----------------------|
| /Consumer |                                            |                       |
| Group     |                                            |                       |
| ALU       | ADDI, SUBRI, ANDI, ORI, XORI, SLTI, SLTSI, | result in general     |
|           | MOVI, STEHI, ADD, SUB, AND, OR, NOR,       | register R; sources   |
|           | XOR, SLT, SLTS, SVA, SVS, SEB, SHE, ZEB,   | from general register |
|           | ZEH, WSBH, CMOVZ, CMOVN, SLLI, SRLI,       | R                     |
|           | SRAI, ROTRI, SLL, SRL, SRA, ROTR           |                       |
| MUL       | MUL                                        | result in general     |
|           |                                            | register R            |
| M2D       | MULTS64, MULT64, MULT32, MADDS64,          | result in accumulator |



| MADD64, MSUBS64, MSUB64, MADD32, | register D |
|----------------------------------|------------|
| MSUB32                           |            |

#### • Unique Producer Instruction Group

| Producer Group | Instructions                                    | Note                   |  |
|----------------|-------------------------------------------------|------------------------|--|
| LD_D           | LWI[.bi], LHI[.bi], LHSI[.bi], LBI[.bi],        | Load instructions      |  |
|                | LBSI[.bi], LW[.bi], LH[.bi], LHS[.bi], LB[.bi], | which have the data    |  |
|                | LBS[.bi], LWUP, LLW                             | register as the        |  |
|                |                                                 | produced state         |  |
| MFUSR_GR       | MFUSR GR $\leftarrow$ D                         | Result in general      |  |
|                |                                                 | register R             |  |
| SCW            | SCW                                             | Generate               |  |
|                |                                                 | success/fail status in |  |
|                |                                                 | general register R     |  |
| DIV            | DIV, DIVS                                       |                        |  |
| LMW_iHdL_nup   | LMW.i with dependency on the                    |                        |  |
|                | highest-numbered register in register list,     |                        |  |
|                | LMW.d with dependency on the                    |                        |  |
|                | lowest-numbered register in register list, and  |                        |  |
|                | both have no base register update.              |                        |  |

• Unique Consumer Instruction Group

| Consumer  | Instructions                                | Note                |  |
|-----------|---------------------------------------------|---------------------|--|
| Group     |                                             |                     |  |
| ST_D_!bi  | SWI, SHI, SBI, SW, SH, SB, SWUP, SCW        | Store instructions  |  |
|           |                                             | which have the data |  |
|           |                                             | register as the     |  |
|           |                                             | consumed state      |  |
| ST_D_bi   | SWI.bi, SHI.bi, SBI.bi, SW.bi, SH.bi, SB.bi | Store instructions  |  |
|           |                                             | which have the data |  |
|           |                                             | register as the     |  |
|           |                                             | consumed state      |  |
| MEM_A_!bi | LWI, LHI, LHSI, LBI, LBSI, LW, LH, LHS,     | LD/ST instructions  |  |



|             | LB, LBS, LWUP, LLW, LMW, SWI, SHI, SBI,      | (non-bi form) which |  |
|-------------|----------------------------------------------|---------------------|--|
|             | SW, SH, SB, SWUP, SCW, SMW, DPREF,           | have the base       |  |
|             | DPREFI                                       | address registers   |  |
|             |                                              | (R1 or R2) as the   |  |
|             |                                              | consumed state      |  |
| MEM_A_bi_R1 | LWI.bi, LHI.bi, LHSI.bi, LBI.bi, LBSI.bi,    | LD/ST instructions  |  |
|             | LW.bi, LH.bi, LHS.bi, LB.bi, LBS.bi, SWI.bi, | (bi form) which     |  |
|             | SHI.bi, SBI.bi, SW.bi, SH.bi, SB.bi          | have the base       |  |
|             |                                              | address register R1 |  |
|             |                                              | as the consumed     |  |
|             |                                              | state               |  |
| MEM_A_bi_R2 | LWI.bi, LHI.bi, LHSI.bi, LBI.bi, LBSI.bi,    | LD/ST instructions  |  |
|             | LW.bi, LH.bi, LHS.bi, LB.bi, LBS.bi, SWI.bi, | (bi form) which     |  |
|             | SHI.bi, SBI.bi, SW.bi, SH.bi, SB.bi          | have the base       |  |
|             |                                              | address register R2 |  |
|             |                                              | as the consumed     |  |
|             |                                              | state               |  |
| BR          | JR, RET, JRAL, BEQ, BNE, BEQZ, BNEZ,         |                     |  |
|             | BGEZ, BLTZ, BGTZ, BLEZ, BGEZAL,              |                     |  |
|             | BLTZAL                                       |                     |  |
| MISC        | TLBOP TRD, TLBOP TWR, TLBOP RWR,             | Consumes general    |  |
|             | TLBOP RWLK, TLBOP UNLK, TLBOP PB,            | register R          |  |
|             | TLBOP INV, CCTL, ISYNC, MTSR, MTUSR          |                     |  |

In AndesCore N10, most of a producer and corresponding consumer have a latency of 1. Only cases need special attention, or deviate from this general rule of thumb will be described below.

| No | Producer   | Consumer           | Latency |      |
|----|------------|--------------------|---------|------|
|    |            |                    | 2R1W    | 3R2W |
| 1  |            | MEM_A_!bi,         | 2       | 2    |
|    | LD_D, MUL, | MEM_A_bi_R1,       |         |      |
|    | MFUSR_GR,  | ALU, MUL, BR, MISC |         |      |



| 2 | LMW_iHdL_nup* | MEM_A_bi_R2        | 1 | 2 |
|---|---------------|--------------------|---|---|
| 3 |               | ST_D_!bi           | 1 | 1 |
| 4 |               | ST_D_bi            | 2 | 1 |
| 5 |               | MEM_A_!bi, 3       |   | 3 |
|   |               | MEM_A_bi_R1,       |   |   |
|   | SCW           | ALU, MUL, BR, MISC |   |   |
| 6 |               | MEM_A_bi_R2 2 3    |   | 3 |
| 7 |               | ST_D_!bi           | 2 | 2 |
| 8 |               | ST_D_bi            | 3 | 2 |

\* Note that this dependency latency does not include the fixed self-stalling latency described in the next section for the LMW instructions.

#### 8.2.2 Self-stall-related Instruction Latency

The following instructions have a fixed latency caused by self-stalling without considering any resource dependency relationships for all register file configurations.

| Instruction / Instruction Category | Latency                       |  |
|------------------------------------|-------------------------------|--|
| RET (correct target prediction)    | 1                             |  |
| DSB                                | 3                             |  |
| ISB, IRET                          | 5                             |  |
| MUL (Slow config)                  | 18                            |  |
| M2D (Slow config)                  | 20                            |  |
| DIV, DIVS                          | Variable (4 – 35)             |  |
|                                    | [i.e. 4+floor(log2(abs(Rb)))] |  |
| LSMW1_A                            | Ν                             |  |
| LSMW2_A                            | N+1                           |  |
| LSMW1_U                            | N+1                           |  |
| LSMW2_U                            | N+2                           |  |
| TLBOP Invalidate VA                | 3                             |  |
| TLBOP Invalidate All               | 65                            |  |



Note:

- (1) All LMW/SMW instructions here are loading N registers.
- (2) *LSMW1\_A* denotes a LMW/SMW instruction *with no* base register update, and accessing word-aligned addresses.
- (3) *LSMW2\_A* denotes a LMW/SMW instruction *with* base register update, and accessing word-aligned addresses.
- (4) *LSMW1\_U* denotes a LMW/SMW instruction *with no* base register update, and accessing word-unaligned addresses.
- (5) *LSMW2\_U* denotes a LMW/SMW instruction *with* base register update, and accessing word-unaligned addresses.
- (6) M2D indicates instructions in the instruction group described in the previous section.
- (7) MUL and M2D of the "fast configuration" has a latency of 1.

The following instructions have a fixed latency caused by self-stalling without considering any resource dependency relationships for the 2R1W register file configuration.

| Instruction      | Latency |
|------------------|---------|
| Load.bi          | 2       |
| Store(.bi) [R+R] | 2       |

### 8.2.3 Cycle Penalty due to N10 Pipeline Control

#### **Mishap Recover**

This section describes the AndesCore N10 pipeline execution cycle penalties caused by recovering certain unlucky events.

| Event Type                                                     | Penalty  |
|----------------------------------------------------------------|----------|
|                                                                | (Bubble) |
| un-aligned 32-bit instruction fetch after pipeline start/flush | 1        |



| branch mis-prediction                         | 2 |
|-----------------------------------------------|---|
| uTLB miss/MTLB hit on small page PTE          | 4 |
| uTLB miss/MTLB hit on large page PTE          | 6 |
| uTLB miss/MTLB miss/HPTWK prefetch buffer hit | 7 |
| (no large page in use)                        |   |
| uTLB miss/MTLB miss/HPTWK prefetch buffer hit | 9 |
| (with large page in use)                      |   |



## 8.2.4 Cycle Penalty due to Resource Contention

This section describes the cycle penalties, caused by resource contention, of instructions after a data prefetch instruction if the data prefetch instruction missed cache. The following instructions will incur additional cycle penalties if they follow the data prefetch instruction too closely.

| Instruction                                              |
|----------------------------------------------------------|
| (cause resource contention with a previous missed DPREF) |
| LD/ST instructions                                       |
| CCTL (D-type)                                            |
| ISYNC                                                    |
| MSYNC                                                    |

The number of cycle penalties depends on the current state of LSU which represents different levels of LSU resource contention.

| LSU FSM state | Penalty (Bubble) |  |
|---------------|------------------|--|
| IDLE          | 0                |  |
| FILL          | 2                |  |
| DRAIN/WB      | M+1-N            |  |
| Others        | 1                |  |

Note:

- (1) M means the number of the words in a cache line.
- (2) *N* means the number of the instructions between the data prefetch instruction and the instruction in the above instruction table.
- (3) The cycle penalty of M+I-N is the worst case scenario assuming that the N instructions in (2) do not cause pipeline stall.



# Chapter 9 AndesCore N12 implementation

This chapter describes CCTL and STANDBY instruction implementated in AndesCore N12 family and contains the following sections

- 9.1 CCTL instruction on page 337
- 9.2 STANDBY instruction on page 338

# 9.1 CCTL Instruction

All CCTL subtype operations implemented by AndesCore N12 implementation are shown in the following table with a LIGHT GREEN background. And four defined optional operations are not implemented (shown in the table as a CLEAR background).

| SubType |     |                | bit 4-3        |              |               |  |
|---------|-----|----------------|----------------|--------------|---------------|--|
| bit 2-0 |     | 0              | 1              | 2            | 3             |  |
|         |     | 00             | 01             | 10           | 11            |  |
|         |     | L1D_IX         | L1D_VA         | L1I_IX       | L1I_VA        |  |
| 0       | 000 | L1D_IX_INVAL   | L1D_VA_INVAL   | L1I_IX_INVAL | L1I_VA_INVAL  |  |
| 1       | 001 | L1D_IX_WB      | L1D_VA_WB      | -            | -             |  |
| 2       | 010 | L1D_IX_WBINVAL | L1D_VA_WBINVAL | -            | -             |  |
| 3       | 011 | L1D_IX_RTAG    | L1D_VA_FILLCK  | L1I_IX_RTAG  | L1I_VA_FILLCK |  |
| 4       | 100 | L1D_IX_RWD     | L1D_VA_ULCK    | L1I_IX_RWD   | L1I_VA_ULCK   |  |
| 5       | 101 | L1D_IX_WTAG    | -              | L1I_IX_WTAG  | -             |  |
| 6       | 110 | L1D_IX_WWD     | -              | L1I_IX_WWD   | -             |  |
| 7       | 111 | L1D_INVALALL   | -              | -            | -             |  |

Table 61N12 Implementation of CCTL instruction

# 9.2 STANDBY Instruction

In AndesCore N12 implementation, *software interrupt* in IVIC mode will not wakeup a core which has entered the STANDBY mode.



# Chapter 10 AndesCore N1213 Hardcore Implementation Restriction

This chapter describes N1213 Hardcore (N1213\_43U1H) implementation restriction and contains the following sections

- 10.1 Instruction Restriction on page 340
- 10.2 ISYNC Instruction Note on page 340



## **10.1 Instruction Restriction**

The following instructions are not implemented by N1213 hardcore (CPU\_VER == 0x0C010003):, Part number N1213\_43U1H

| DIV/DIVS                           |  |
|------------------------------------|--|
| STANDBY wait_done                  |  |
| MFUSR Rt, PC                       |  |
| MFUSR Rt, USR, Group 1 and Group 2 |  |
| MTUSR Rt, USR, Group 1 and Group2  |  |

## **10.2 ISYNC Instruction Note**

The correct instruction sequence for writing or updating any code data that will be executed afterwards for AndesCore N1213 hardcore (N1213\_43U1H) is as follows:

```
UPD LOOP:
   // preparing new code data in Ra
   . . . . . .
   // preparing new code address in Rb, Rc
   . . . . . .
   // writing new code data
   store Ra,[Rb,Rc]
   // looping control
   . . . . . .
   bne Rb, Re, UPD LOOP
WB LOOP:
   // preparing new code address in Rd
   isync Rd (or cctl Rd, L1D VA WB)
   // looping control
   bne Rd, Re, WB LOOP
   msync
   isb
ICACHE_INV_LOOP:
```



// preparing new code address in Rf
cctl Rf, L1I\_VA\_INVAL
// looping control
bne Rf,Re,ICACHE\_INV\_LOOP
isb
// execution of new code data can be started from here
.....

Please see the ISYNC instruction note for other implementations.



#### **Proprietary Notice**

Words and logos marked with <sup>™</sup> are registered trademarks or trademarks owned by Andes Technology Corporation, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners.

Copyright© 2007-2009 Andes Technology Corporation. All rights reserved.

AndeStar<sup>™</sup> ISA Manual contains certain confidential information of Andes Technology Corporation. Use of this copyright notice is precautionary and does not imply publication or disclosure. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of Andes Technology Corporation.

#### **Confidentiality status**

This document is Preliminary. This document has no restriction on distribution.

#### Feedback on this Manual

- If you have any problems with this Manual, Please contact Andes Technology Corporation by email at <a href="mailto:support@andestech.com">support@andestech.com</a>
- or on the Internet at <u>www.andestech.com</u> for support giving:
- the document title
- the document number
- the page number(s) to which your comments apply
- a concise explanation of the problem

General suggestions for improvements are also welcome.