Why you should try to make an emulator

Download Report

Transcript Why you should try to make an emulator

Why you should
make an emulator…
[email protected]
REVISION 2015
Aims
•Make you think about “emulation” in a broader way
•Convince you that writing an emulator is fun
(and you’ll learn useful skills)
•Highlight a few tricky areas
•Give a lightening introduction to FPGAs
About me
•First computer 1983 – Dragon 32
• Learning how to adapt type-ins for other systems to the Dragon
•Gradually co-opted my Dad’s Amstrad CPC 464 between 1985 and 1986
• Spectrum loader type-in -> learning how to adapt Spectrum save routine to CPC
•RM Nimbus PC with “IBM emulator” at school from 1987
•Bought an Amiga around 1989
•University in 1994, discovered UNIX and “source compatibility”
•Working life – half “business things” and half “games development”
•Experience on many varied types of machines and CPUs
•Oh, and all views and opinions are my own! 
Classic arcade games
Nostalgia?
Definition
em·u·late
(ĕm′yə-lāt′)
From Latin aemulātiō ("strive") = aemulor (“I rival, emulate”) + -ātiō (“-ation”).
1. To strive to equal or excel; imitate with effort to equal or surpass.
2. To compete with successfully; approach or attain equality with.
3. Computers
◦ to imitate the function of (another system), by using a software system, often including a microprogram
or another computer that enables it to do the same work and run the same programs, and achieve the
same results.
◦ to replace (software) with hardware to perform the same task.
Related terms
Retro gaming
Compatible
◦ (of software) capable of being run on another computer without change.
◦ (of hardware) capable of being connected to another device without the use of special equipment
or software.
Legacy
◦ of or relating to old or outdated computer hardware, software, or data that, while still functional, does
not work well with up-to-date systems.
Reverse engineering
What’s the legacy of these systems?
Why should we emulate?
•Compatibility with the previous generation allows easy transition
• Ready made market and existing customer momentum
• Doesn’t have to be an all-or-nothing transition
• People can continue to use their existing software, but can do more with it
• More memory, faster execution, etc.
• New features will encourage more sales
Why should we emulate?
•Hardware used to be considered hard and expensive
• Billions of dollars to set up a chip fab
• Slow process, mistakes are costly to rectify
• Despite the cost of development, older hardware becomes obsolete rapidly
•Software by comparison was traditionally considered “easy”
• Barrier of entry for software low – just a compiler or assembler
• Result: lots of software, and people want to use older, unmaintained software
•Is this true today?
• Modern software complexity in MLOC with hundreds to thousands of developers on a project
• Modern CPUs have ~7bn transistors, but many repeating structures
• How many CPU designers in the world compared to number of software developers?
Why should we emulate?
•Systems that don’t get emulated are expensive to maintain
• Ultimately all of this software becomes unusable or impractical due to scarcity of hardware
•Reasons not to be emulated
• Why buy your product if another does everything yours does and more?
• Regional lock-in, copy protection, etc…
Why me? Why should I
make an emulator?
Understanding the machine
•Great way of getting to know a new machine
•Will be able to fully appreciate the instruction set of the CPU – often applicable to other CPUs
•Better appreciation for compiler generated code and how to optimise code
•State machines might well be the simplest solution
•Really understand the entire machine
Much to learn you still have…
Following a design
•Building a project to someone else’s design
•In some cases, trying to figure out what their design was
•Learning to understand the problem from their perspective and why they chose their solutions
•You’ll probably learn a new approach that help you think about other problems in a new way
• If you’re a software guy, a hardware perspective will definitely give you new insights!
•A combination of detective work and research
Test-driven development
•Emulators are great candidates for test-driven development
•Great to log everything at first
•
•
•
•
Disassemble each instruction as it’s executed along with relevant registers
Great for debugging and seeing what changes between runs
Can compare against known results, e.g. NESTEST and 6502 core
For speed, you’ll probably want to be able to turn this off selectively
•Unit tests for subsystems, they can probably all be used independently and incrementally
• Try to test everything you’ve implemented!
• You can throw away parts as you realise a better way, so it can grow with your knowledge
•You probably already have programs you can use to test your progress so you can be confident when
you’ve hit your goal
•Write test programs and compare against real hardware whenever possible
Optimisation
•For a software emulator, it’s always worth starting with un-optimised code
•
•
•
•
Quickly get the system up and running, don’t be afraid to throwaway code!
Especially with your unit tests, you can optimise later with confidence that it still works
Might be best not to implement things until they’re needed… or until your have a test case ready
The target system is probably deterministic from a known set of conditions
•Different approaches
• Parsing bit patterns in CPU instructions – slowest but less code and closer to hardware implementation
• Switch block with a case for each instruction – long and possibly unwieldy
• C/C++ macros can simplify this
•
•
•
•
Function per instruction – as specialised as required
Heat maps
Dynamic code generation
Memory map markup – read/written flags, PC when modified, data breakpoints, etc.
How to start
making an emulator
Components of a typical system
CPU
RAM
ROM
Video output, possibly GPU
Audio
Storage
Other IO
Glue logic – 74 series chips, ULAs, PLDs, FPGAs, ASICs.
Need to understand your system
Google it!
Need to understand your system
Datasheets are full of… data!
Decapping
So, now you understand
what you’re emulating…
Decide on your goal
•As fast as possible?
• Maybe you just want a serial based CPM system…
• Or only care about certain pieces of software or hardware
•Exact CPU timings?
•
•
•
•
Each chip will usually be synchronised in hardware
Games and demos usually will require more accuracy
Most things driven by CPU, but need to be able to respond to interrupts in a timely manner…
If the target system is slow enough, you can interleave CPU cycles and other hardware, that requires
state in each system… just like real hardware!
•Increased capabilities and optional add-ons?
Start with the CPU
•Do you need to consider instruction pipeline?
• IF – ID – EX – MEM – WB is typical, but some pipelines are deeper
• (INSTRUCTION FETCH – INSTRUCTION DECODE – EXECUTE – MEMORY ACCESS – REGISTER WRITEBACK)
• Some architectures have latency before registers are updated
• Delay cycles / stall
• Old results returned
• Branch delay slots
• Sparc, MIPS
Memory access
Most systems have memory maps split into distinct regions, i.e. they use higher address bits for
region decode, e.g. 4 x 16KB banks on Amstrad
Many systems (especially RISC and 6502) use memory mapped IO, e.g. C64: $D000-$DFFF
Consider masking off these region bits and using them as lookups into a table of read/write
functions, e.g. page size of 4KB on many 32-bit systems.
// example with 256-byte pages on a 6502:
class MemoryHandler {
virtual uint8_t read_byte(uint16_t addr) =0;
virtual void write_byte(uint16_t addr, uint8_t data) =0;
};
MemoryHandler *memory_handlers[0x100];
Example 6502 memory maps
NES
C64
Video output
•Very simple fixed layout, e.g. Spectrum
• Frame at a time?
• Rendering in parallel to and synchronised to CPU instruction – e.g. loading borders
•Racing the beam?
• Essential to have accurate cycle counting
•Programmable hardware, e.g. 6845 in CPC, IBM, BBC – can even change width of a line!
•Tiled video memory, e.g. NES, C64, IBM text modes
•Memory contention – very common between CPU and video hardware
• Usually resolved with a CPU stall or video snow
Video output
What frame rate are you running at?
◦
◦
◦
◦
Is that the same as the emulating system?
Change speed to match -> audio will change speed
59.94Hz of most NTSC systems c.f. 60 Hz of most monitors – speed up probably wouldn’t be noticed
50Hz PAL systems –only 5 of 6 frames rendered (juddery) or blend frames
Sound
Can be very simple (on/off for Spectrum) or complicated (many channels or midi)
Especially for the simple beeper case, you’ll definitely need to cycle count accurately as incorrect
timing can drastically change the sound
Less important on a chip like AY-8192
◦ Timing for tone generation is done on the chip itself
◦ CPU usually will change registers infrequently, e.g. 50Hz or 100Hz
◦ Things like hardware envelope reset are timing dependant
You may need a LUT on the output to correct the volume to emulate any filter circuitry
Not all channels are the same volume, e.g. CPC – 1000R on L&R, 2200R on C, split between L&R
Input – Amstrad CPC
Storage
Are there any established formats for your platform or similar?
◦ e.g. tzx for Spectrum also used on Amstrad CPC, d64 for C64 disk etc.
◦ Disk image format not shared, even though the hardware is basically the same
◦ Could you extend your emulator to allow it to discover the new system?
◦ E.g. selecting disk image from inside the emulator, flashing a “ROM” from a disk file, etc…
Snapshots / image of running system
◦ Could be done at a vsync boundary, but state of each component needs to be stored especially timing
data
◦ Very useful for debugging, especially is snapshots are created automatically at regular intervals – can
replay and single step through a problem rather than trying to diagnose it post mortem
Glue logic
Probably toughest part documentation-wise.
Hopefully all the programmer-accessible parts are well documented
Internals probably not documented
Strange behaviour, e.g. interrupt handling rules on Amstrad
◦ May be simpler than
Probably best to google for “how to program” documents
Remember, there’s probably a good reason for every odd looking decision!
Proprietary systems probably have all this information under NDA or never disclosed to
developers at all. Patents often document systems very well, however.
And when it all comes together…
Doing it in hardware
Why hardware
Hardware is “cooler”
Can make a “drop-in” replacement, e.g. output to a real TV
Nice to have a dedicated emulator system, feels more authentic
Easier to think in the cycle-counting mindset – it’s a lot closer to the original hardware this way
You get to learn something new! 
FPGA
macroblock
Forget everything you know about software…
well almost everything!
FPGA is nothing like a CPU!
Lots of parallel circuits, if you want sequential
operation you need to make a state machine.
These macroblocks are essentially 4-bit input, 1bit output lookup tables, but can also be used as
RAM or shift registers.
Synthesizer takes care of most of the details.
Don’t think in terms of bytes, words, ints, etc…
data is as wide as you need.
FPGA elements
•BRAM blocks – for registers, delay lines, small caches, complicated lookup tables, boot ROMs
•
•
•
•
Very configurable, e.g. 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36
Can be wired even wider using multiple blocks
Dual ported
FAST
•Modularise everything
• May be able to reuse elements in other designs, e.g. PS/2 keyboard, flash ROM, etc.
• FPGAs are parallel, so you can use a module multiple times
• Easier to replace
Clocks
Clocks are especially problematic – you want very few clock domains, and ideally convert all
sources to the master clock as soon as possible.
Limited global clock resources…
Clock dividers and phase problems
Don’t want too fast a clock or the design won’t synthesize correctly
You may be able to ignore some of these errors if you know the clock is divided from a faster
source.
FPGAs
Often you’ll want to step back from the problem and try to work out how the chip was originally
implemented…
If you see any comparisons apart from equality, it’s probably wrong!
Preferable to reset a counter to a known value and increment / decrement until all bits 0 or 1 or
a carry occurs
You can use a LFSR instead of a counter to optimise gate counts, but it’s harder to determine the
initial values
CPC FPGA board
5V power
512KB RAM, 512KB ROM
PS/2 keyboard
Audio jack
SCART
USB (serial and programming)
SD card slot
2 x joystick
Expansion pins – lots of expandability
Resources
•http://cpcfpga.com/
•The ZX Spectrum ULA: How to design a microcomputer – Chris Smith, ISBN 978-0-9565071-0-5
•Rapid Prototyping of Digital Systems: A Tutorial Approach – James Hamblen, ISBN: 0792386043
•http://www.visual6502.org/
•http://www.righto.com/ - Ken Sheriff’s blog
•http://z80.info/
•http://wiki.nesdev.com/
•http://www.worldofspectrum.org/
Example VHDL
int a,b;
int sum,product;
// inputs
// outputs
void run_one_cycle(void)
{
sum = a+b;
product = a*b;
}
entity example is port (
clock: in std_logic;
a: in integer range 0 to 65535;
b: in integer range 0 to 65535;
sum: out integer range 0 to 131070;
product: out integer range 0 to 4294836225)
begin
process(clock)
begin
if rising_edge(clock) then
sum <= a+b;
product <= a*b;
end if;
end process;
end example;
Example using bit vectors
int a,b;
int sum,product;
// inputs
// outputs
void run_one_cycle(void)
{
sum = a+b;
product = a*b;
}
entity example is port (
clock: in std_logic;
a: in std_logic_vector(15 downto 0);
b: in std_logic_vector(15 downto 0);
sum: out std_logic_vector(16 downto 0);
product: out std_logic_vector(31 downto 0)
begin
process(clock)
begin
if rising_edge(clock) then
sum <= (“0”&a) + (“0”&b);
product <= a*b;
end if;
end process;
end example;