MicroJava-701-by-Baecker-Bungert-Gladisch-Titze-1998

Download Report

Transcript MicroJava-701-by-Baecker-Bungert-Gladisch-Titze-1998

MicroJava-701
Philipp Baecker · Johannes Bungert · Andreas Gladisch · Christian Titze
Introduction
• The first microprocessor that executes Java bytecodes
directly in hardware
• Some results suggest that MicroJava 701 will be twice as
fast as a 266 MHz Pentium II system on Java code
• MicroJava 701 looks to be a dynamite bargain for
customers determined to build Java-execution machines.
• What kind of machines might those be?
• The hypothetical Java-based network computer has been
slow to appear, perhaps because Java applications are not
thick on the ground.
Introduction
• Without plentiful Java apps, Java systems are superfluous,
without the Java-system, the apps may not come.
• The 701 looks better the more bytecode the system has to
run.
• For an all bytecode-system, the 701 is probably faster and
cheaper than anything else.
• MicroJava 701 makes sense for some small fraction of the
market (that does not now exist) that mainly relies on Java
code and does not already have a microprocessor in it.
Introduction
• Java hardware, software, education, and advertising are
sun’s featured products.
• Sun is more interested in Java itself than in Java chips
specifically.
• So, Java chips are a complement, not a replacement for
software only Java environments.
Features
•
•
•
•
•
•
•
•
•
•
picoJava-II Performance Java architecture
Operating Frequency of 133 to 200 MHz
Maximum Power Consumption of 4 W
0.25 micron CMOS technology.
64 × 32-bit Stack Cache
16 Kbyte direct-mapped Instruction cache
16 Kbyte, two-way set-associative Data cache
32-bit Integrated Floating-Point Unit
Support for big- and little-endian data byte ordering
Interface to PCI Bus
Features
•
•
•
•
•
Integrated memory controller
Programmable I/O
Ten External Interrupts
Power management
Local Bus for low cost peripheral expansion, connection to
8-bit, 16-bit, or 32-bit slave devices (e. g. a boot PROM)
• Interrupt Controller and multiple timers (programmability
of interrupt priorities).
• 2.5 V for the CPU core and 3.3 V for I/O
Memory Map
• The microJava-701 CPU permits the following memory
regions to be placed anywhere within the CPU’s 1 GB of
addressable address space:
– DRAM (both EDO and SDRAM)—four banks
– Local Bus—four banks
– PCI Memory/IO—three banks
• The fixed regions in the memory map are as follows:
– Registers
– Boot code selected by FLASH_CS#—only the starting
address is fixed (the Region’s size can be programmed)
Memory Map
Cacheability (MB)
Starting Address (Hex)
Noncacheable (256MB) 3001.0000
3000.8000
3000.2400
3000.2000
3000.1C00
3000.1800
3000.1400
3000.1000
3000.0C00
3000.0800
3000.0400
3000.0000
Cacheable (768MB)
0200.0000
0
Region Size
256M-64KB
32KB
23KB
1KB
Region
available
SDRAM MRS
Reserved
PCI Bus Registers for PCI
Bus Configuration
1KB
PCI Bus Registers for PCI
Bus Feature Control
1KB
Timer, Clock Control, and
System Information Registers
1KB
Reserved
1KB
Reserved
1KB
Interrupt Controller Registers
1KB
Reserved
1KB
Memory Controller Registers
1KB
PCI Bus Registers for PCI
Host Mode Configuration
736MB (min) available
32MB (max) Boot FLASH_CS#
microJava-701 Block Diagram
Netcomputer Block Diagram
Integer Unit (IU)
• Java integer instructions
– defined in the Java Virtual Machine Specification
– extended picoJava-II specific instructions
• 64-word (32-bit) stack cache
• Executes prefetched instructions using a six-stage pipeline
• Supports instructions such as shift, integer multiply, integer
divide and stack manipulation.
• Little-endian and big-endian data representation.
• Up to four instructions can be folded together and executed
in parallel.
Floating Point Unit (FPU)
• The FPU executes all single-precision and doubleprecision floating-point instructions as defined in the Java
• Virtual Machine Specification.
• Has its own
– microcode sequencer
– Floating point adder
– Floating-point multiplier/divider.
• float and double represent single-precision 32-bit and
double-precision 64- bit format IEEE 754 values as
specified in IEEE Standard for Binary Floating-Point
Arithmetic
Floating Point Unit (FPU)
• float and double
– Positive and negative sign-magnitude numbers
– Positive and negative zeroes
– Positive and negative infinities
– Special Not-a-Number (NaN) value
• Finite nonzero values of type float: s xfa m xfa 2e, where
– s is +1 or –1,
– m is a positive integer less than 224,
– e is an integer between –149 and 104,
• Smallest positive nonzero value:1.40239846e–45F
• Largest positive nonzero value: 3.40282347e+38F
Floating Point Unit (FPU)
• Finite nonzero values of type float: s xfa m xfa 2e, where
– s is +1 or 1,
– m is a positive integer less than 224,
– e is an integer between 149 and 104,
• Smallest positive nonzero value:
4.94065645841246544e– 324
• Largest positive nonzero value:
1.79769313486231570e+308
• Floating-point values are ordered
• NaN is unordered
Cache
• Instruction Cache
– 16kByte in size
– Direct mapped cache organized as 1024 lines × 16Byte.
– Instruction cache line fill done four 32-bit words at a
time.
• Data Cache
– 16kByte in size.
– Each set is 512 lines × 16Byte.
– Data cache line fill done four 32-bit words at a time.
DRAM Memory Interface
• Complete EDO DRAM and SDRAM controller generates
all signals necessary to support from 1 MByte to 256
MBytes of EDO DRAM or SDRAM.
• EDO DRAM at speeds of 70ns, 60ns, and 50ns.
• SDRAM at frequencies of either 1/2, 1/3, or 1/4 the CPU
clock rate (e.g., at 100 MHz, 66 MHz, or 50 MHz for a 200
MHz CPU).
• DRAM devices must be of the same technology and speed
grade.
• 32-bit and 64-bit DRAM devices
• DRAM system is organized as four banks, varying from 4
MBytes to 64 MBytes in size.
Flash Memory Interface
• Local bus interface suitable for attaching
– Flash memory boot PROM
– Super I/O controller
– Other slave I/O devices with timings similar to
Flash memory.
• Five banks provided for local bus connections
• One bank dedicated for Flash memory that has a
fixed starting address 0 to be used for the boot
program.
• 64KByte to 1GByte of Flash memory.
• Data bus width of 8-bit, 16-bit or 32-bit
Interrupt Controller
• 15 interrupt levels and one nonmaskable interrupt (NMI)
• NMI, six external interrupts, EXT_INTR[5:0], and four
low level interrupts, LL_INTR#[3:0] made available for
general purpose use.
• Four sources of internally generated interrupts:
– tick timer,
– general purpose timer,
– watchdog timer
– PCI error.
• Two software interrupts.
• An interrupt source can be mapped to any of 15 levels.
Interrupt Controller
• Level-triggered interrupts triggered with a low logic level
• Edge-triggered interrupts triggered on rising edges.
• Edge-triggered interrupts cleared by writing a one to the bit
position in the Pending Register (PEND_INT) to be
cleared.
• After an edge-triggered interrupt has been pro-cessed, its
pending bit must be cleared before another trigger event
can be sensed.
• Trigger type of external interrupts, EXT_INTR[5:0], are
programmable, while low level trigger interrupts,
LL_INTR#[3:0], are always triggered with a low level and
are normally used for PCI bus interrupts.
DRAM Bank Aliasing
BER: Bank Enable Register
DADR: DRAM Access Decode Register
DADAR: DRAM Access Decode Alias Register
Memory Aliasing Example
Missing Handbook?
• Programmer’s Reference Manual picoJava-II processor
core will not be available before 1999
• But: microJava-701 is hardware implementation of JVM
(Java Virtual Machine) plus extensions (absolute
addressing!)
• JVM instruction set uses opcodes and mnemonics also
used by the silicone chip
Instruction Set
• Instructions identified by one-byte opcode
• More complex operations have to be emulated
• Example of the inner loop of the Virtual Machine:
do {
fetch an opcode;
if (operands) fetch operands;
execute the action for the opcode;
} while (there is more to do);
Load and Store Instructions
• Load and store instructions transfer values between the
Virtual Machine’s local variables and operand stack:
iload, iload_<n>, lload, lload_<n>
fload, fload_<n>, dload, dload_<n>
aload, aload_<n>
Arithmetic Instructions
• Two types:
– integer value processing
– floating point value processing
• No support of byte, short, and char types
Arithmetic Instruction Set
•
•
•
•
•
•
•
•
•
•
•
Add: iadd, ladd, fadd, dadd.
Subtract: isub, lsub, fsub, dsub.
Multiply: imul, lmul, fmul, dmul.
Divide: idiv, ldiv, fdiv, ddiv.
Remainder: irem, lrem, frem, drem.
Negate: ineg, lneg, fneg, dneg.
Shift: ishl, ishr, iushr, lshl, lshr, lushr.
Bitwise OR: ior, lor.
Bitwise AND: iand, land.
Bitwise exclusive OR: ixor, lxor.
Local variable increment: iinc.
Type Conversion Instructions
 Support:
 int to long, float, or double
 long to float or double
 float to double
Instruction Set Example
 Instruction: swap
 Operation: Swap top two operand stack words
 Forms: swap = 95 (0x5f)
 Stack: … , word2, word1 > …, word1, word2
 Description: The top two words on the operand stack are
swapped
Perfomance and Speed
• 85% of Java Bytecode processed in hardware
• Frequently occurring Sequences replaced
• Innovative Hardware Stack
• Accelerated Stack Management
Innovative Hardware Stack
• Top 64 entries on the stack contained within picoJava onchip stack cache
• Java programs invoke methods
• Streamlining method invocation substantially improves the
performance of Java code
• Overlap between the methods allows direct parameter
passing without copying
Accelerated Stack Management
• Access usually limited to the top portion of the stack
• Random, single-cycle access to the stack: FOLDING
– moving data to top variable and consuming that data
“folded” into one operation.
Processor Speed