Retro-hacking for fun and madness: Let's Vivisect Solaris!

The Something Awful Forums > Discussion > Video Games > Let's Play! > Retro-hacking for fun and madness: Let's Vivisect Solaris!

Centurium: Aug 17, 2009

ManxomeBromide posted:

That just leaves one question, really: why does this Delay mode even exist in the first place?

The basic idea here, I think, was that the original designers didn't really expect you to change the graphics every scanline. You kind of have to if you're doing anything fancy, but they expected you to keep sprite graphics constant for pairs of scanlines. An individual pixel on the 2600 is nearly twice as wide as it is tall, so doubling up pixel rows gives you something that looks a little more square. I think the original idea here as that you could set the Delay mode and then your sprite would move down one scanline without you having to alter the graphics update code to check every scanline. Hence, the documented name of the register here: VDEL (Vertical DELay).

But they decided to implement it as two secret extra graphics registers, and a few years after release one of Atari's developers noticed that he could exploit that within a single scanline. As a result, this control register was used in drat near everything made for the system, but it was used for this 48-pixel sprite trick and not for its intended purpose. By 1986, when Solaris was published, you could be forgiven for thinking this was the intended purpose of the mode all along.

Holy crap. It's hard to express just what a beautiful bit of computer engineering this is. Let's start with the chip:

In 1977 cost was a big deal for microprocessors. Basically, the smaller your chip, the more chips you can fit on one die that comes out of your manufacturing process. Since cost is pretty much set per die, that means smaller chips are cheaper more or less in proportion to how much smaller they are. The 6502 is a 40 pin package, and by cutting down to a 28 pin package on the 6507 they could offer it at a lower price. But, to do that they dropped 3 address pins, and ended up with a sixteenth the addressable memory space. If you looked at that now, the savings vs. lost capability seems insane. The really, truly crazy thing is that it was justifiable in 1977 because ROM chips were so drat expensive that the notion of a game on the home market with more than 4kB of memory was laughably overpriced. By 1980 process improvements had made ROM chips a whole lot cheaper.

That means that for most of the Atari's commercial career it used a chip based on functionally obsolete design tradeoffs. That the 1982 Atari 5200 (which did use the 6502) failed is a very interesting story in large part about the 2600.

Now, the delay mode is really remarkable from a intended function vs. use standpoint. Registers take up a lot of space and therefore money. The SNES' processor, two generations hence, only had four useable registers vs. the Atari's three. What the next generation did to address this problem (although it also added dedicated sprite related hardware) was something called Direct Memory Addressing that lets the device responsible for making the picture access the RAM directly on a few specific RAM addresses. CPU sends an input signal to the peripheral, it reads the RAM, and you're good to go. But that requires close integration between the display device and the CPU and the NES or 5800 had 2kB of ram vs. the 2600's 128b.

A simpler solution to synchronization problems with peripherals is to implement a buffer. Buffers are just parking spaces that let you load information before the associated device can act on it. So you get something like:

[CPU]----A---->[ ][ ][ ][Device(not ready)]
[CPU]----B---->[ ][ ][A][Device(not ready)]
[CPU]----C---->[ ][B][A][Device(not ready)]
[CPU]__________[C][B][A][Device(not ready)]
[CPU]__________[ ][C][B][Device(ready)]
[CPU]__________[ ][ ][C][Device(ready)]
[CPU]__________[ ][ ][ ][Device(ready)]

The VDEL trick essentially does this for the space of one sprite. It's not really a buffer though because the point of a buffer is to allow asynchronous operation between two devices.

What Atari actually did in 1982 with the 5200 is a mix of both strategies (not that weird) that used a separate ASIC that's almost it's own microprocessor (kinda odd). I point this out because the update the display register just at the right time trick (called "racing the beam" in late 70's cool speak) kept the price of the system a lot lower than it might have been. That was a big deal in making the 2600 a success. But the 2600 was built at the end of a period of price assumptions and is very, very limited compared to what was being sold just 5 years later. There's a lot of things Atari did to gently caress up the 5200 for themselves, but the fact that 2600 games kept on coming that looked good and fresh (if not cutting edge) really killed the market.

So how awesome is the engineering in this game (and others like it)? So awesome that it ate the next generation in the era of some of the most rapid advances in computer hardware and probably is one of the main causes of the video games crash of 1982.

Centurium fucked around with this message at 23:06 on Jul 29, 2016

# ¿ Jul 29, 2016 23:04

Adbot: ADBOT LOVES YOU

# ¿ Apr 20, 2024 00:37

Centurium: Aug 17, 2009

^^^^^
A big part of that problem was that Atari failed to launch the 5200 with what we know and groan about today as tech demo launch titles. That combined with a lack of internal discipline to move Atari's own designers to 5200 games and suddenly Atari had a console on shelves that they hadn't provided a compelling reason to develop for. Ouch. And then the market crashed.

ManxomeBromide posted:

Tech Post 3: The Sweep of History

I'm going to break from my plan on this one because a major topic in the thread discussion so far has been on the incredible technical advancements through the 1980s and the disparities within it. So I went and thought of all the really iconic 8-bit technologies and their immediate competitors and looked up when they were all introduced.

The result is a timeline that's honestly pretty mindboggling, even if you lived through it. I didn't live through all of it, but enough that I'm seeing technologies show up way the Hell before they were supposed to. This timeline is a bit America-centric but a number of foreign systems were important enough that news of them reached our shores.

1975: MOS Technologies develops the 6502 chip, itself a clone of an older Motorola chip and changed just enough to get Motorola's lawyers off their backs. It is a full-featured CPU implemented in under 10,000 transistors. This is an astonishing feat, even then.

As a modern computer engineer, it's hard to overstate both how beautiful and difficult this is. Chances are, if you're in the VLSI field today you're either trying to make chips consume less power or trying to tie together CPU cores and memory in a way that lets you get slightly better marginal gain from slapping down more transistors. No one touches the efficiency of the logic like this. What they did was like building an aqueduct out of a hundred stones.

ManxomeBromide posted:

1978: Intel releases the 8086, a 16-bit CPU.

1979: Motorola releases the 68000 chip, a system of 40,000 transistors that is also the first 32-bit CPU that is seriously used in the consumer space. It won't be for a while, but let's just note that we're still in the 1970s and the 32-bit era has already kind of begun. Intel, meanwhile, releases a version of the 8086 with fewer data pins called the 8088.

1981: IBM decides to enter the personal computer market to compete with Apple, Commodore, and Atari. They have agreements with Intel for chip supply, so they go with the 8088.

1985: The IBM PC series of computers enters the 32-bit era with the release of the 386. This is the point where machine code written in 1985 to execute on machines of the time can in principle run unmodified on modern processors. A British company receives a prototype for a new kind of processor based on some research out of UC Berkeley. 1985 may, in all honesty, be the single most eventful year in computing.

1987: In the UK, the prototype from 1985 bears fruit and the Acorn Archimedes is released. This is the first consumer system powered by a true RISC processor�the Acorn RISC Machine, or ARM for short. It's a big deal, and it stays one.

1994: Meanwhile, in Japan, Sony releases the Playstation console, a MIPS-based system. The 32-bit era of console has unequivocally begun. MIPS is actually the architecture whose design inspired the Acorn Archimedes back in '87. Speaking of which, this is also the year that the ARM line reaches version 7 of its design. ARMv7 is kind of the 80386 of the ARM architecture; any 32-bit ARM code in the modern era is basically ARMv7 with some extra tuning or special features. ARMv7 is also at least as important as the 386 for the purposes of this history; code built to this spec powers the GBA, the DS, the 3DS, and basically every mobile iOS or Android device out there. Actually, it's probably more relevant in the present day: we're only just now seeing the 64-bit variants coming to sweep the 32-bit systems away, while 32-bit x86 code has been the legacy mode for quite some time in the Intel world.

So what's the deal with these instruction sets? Why should you care if they're reduced or not?

Well, let's talk about a complex instruction set like x86. They use commands that let you do, in one instruction, the things you regularly want to happen, like add register A to register B and store the result in memory. When we say instruction, this is the mnemonic you see in the debug window that shows lines of code. That's (more or less) human readable and lets you know what's going on (like ADD $1 $2 $3- add register one to register two and store the result in register 3) (this is not a real language). But to the machine, this one for one corresponds to logic high/logic low on its inputs that selects which portions of the CPU to use and what to do with the result. Real x86 instructions can be quite complex but man, why would you want to write out every single stage of every single operation?

Well, some smart folks started thinking about this. See, the things a microprocessor does don't all happen in one cycle. Without getting into too much detail, the computer has discrete stages where input signals pass through transistors and become results. Some smart folks in a bunch of places figured out that you could squeeze speed in by loading the first part of the next instruction before the last part of the last instruction finished. But there's a limit to how much extra you can get out of that because the hardware is built to minimize the time complicated instructions take and so you have 'lock out' periods where you absolutely, positively can't load anything else before the current instruction passes a certain stage.

Then, those Californians and Subjects of Her Majesty got to thinking- what if you just removed all the instructions that fowled up your cramming? That cramming is called pipelining- and thus we have the Microprocessor without Interlocking Pipeline Stages instruction set. No division- too hard. Just subtract a bunch instead. Each instruction does one thing. Add A to B, or add A to 125 in one cycle. Want to store that in the RAM? Write an instruction for it. What you get is a processor that can get pretty close to taking in and putting out one instruction every clock cycle. So even if you're a bit more verbose in your assembly, your processor is just burning through those lines of assembly code.

Interestingly, both are still competing designs (well, ARM and Intel. MIPS is a real thing, but I don't think there's much of note using it any more) but for completely different reasons than in this period. Reduced Instruction Set Computers made a lot of sense if you're taking the Atari approach and trying to cram critical operations into a tiny amount of time like if you have to race the cathode tube beam. Complex Instruction Set Computers can be better (especially in this period) at a really broad range of tasks, especially if you need to do a bunch of high complexity (to the processor) tasks regularly. The Playstation was a sweet spot of RISC being really good at video games.

Now, we have so drat many transistors cycling so quickly no one really gives a drat (comparatively). Instead, RISC processors are attractive because the same hardware choices that let them run quickly make them easier to eliminate power draw from. That's why ARM is so dominant in phones and light duty computers (whatever your marketing department wants to call the thing you use to browse the internet and that's it.) CISC computers are likewise killing power consumption quickly, but they continue to go for doing more things over a RISC architecture.

How does this relate to our plucky 2600?
Think about what ManxomeBromide was showing us with having to wait X clock cycles to read or write to the cache, or to perform a write to the I/O to the TV controller. All the while that cathode tube moves at its predestined rate, deaf to your excuses as your write is tardy. Gosh, wouldn't it be better if you could read/write to the cache faster? That's the fundamental pain that led people to work towards RISC architectures.

And now you know. And knowing is half the battle.

Centurium fucked around with this message at 05:28 on Aug 1, 2016

# ¿ Aug 1, 2016 05:23

Centurium: Aug 17, 2009

TooMuchAbstraction posted:

As I understand it, basically all CPUs are RISC "under the hood" these days due to the performance benefits it conveys. Intel CISC chips actually do a runtime conversion of each CISC operation into a set of RISC operations -- like having a runtime interpreter for your code, implemented at the chip level! For example, the CISC command "load parameters from these addresses X and Y into registers A and B, calculate their sum, and store that at address Z" could be translated into a sequence of four RISC commands:
Load X -> A

Load Y -> B

A + B -> C

Write C -> Z

Sort of. That's an adaptation to a much later model of computing described as superscalar. I'm not sure how to describe that without getting both very technical and very off topic. In the very short version, modern CISC machines break the instructions down and then do multiples of them at the same time on the different components of the microprocessor.

# ¿ Aug 2, 2016 03:04

Centurium: Aug 17, 2009

ManxomeBromide posted:

From the reading I was doing on the chips to make that timeline, it also seems that doing it without that split was common at the time too ("microcode"), but that one of the reasons the 6502 was so efficient was that it didn't use that technique. The 68k did. The 6502 was also said to be "pipelined" but I can't map what they must have meant by that to any notion I learned in my Computer Organization and Design class. The 6502 instruction set is actually pretty RISCy on its own.

For what it's worth, I'd have to agree. From hanging out with professors who were actually there and involved in chip design at the time, it seems like there's always been a big difference in design principles (which are ideas) and actual designs (which are machines that do work). With the warning that I'm not really experienced with VLSI, my understanding is that in the process of actually making a chip you start with high level ideas and go through a series of decisions about tradeoffs that are informed by the design principles the team chooses. That's obviously going to lead to actual metal with more or less resemblance to the design principle based on both the choices made and how individual engineers implemented things. And then there's the debug process.

That's a long winded way of saying what they tell a PhD candidate and what actually gets committed to silicon are naturally going to be hard to rectify. And those same professors definitely think the NES is an example of RISC coding. I think most people ask whether there's the ALU/Immediate+memory instructions in drawing the RISC CISC line.

ManxomeBromide posted:

The x86, meanwhile, has instructions like rep movsw which:
Copies 2 bytes from memory location [DS:SI] to [ES:DI].
increments SI and DI by 2 each.
Performs the first two steps a number of times equal to the value in the CX register.
That benefits from microcode in a way that the most complicated instruction on the 6502: "Set a process flag, push the flags onto the stack, then jump to the subroutine at the memory location listed at the top two bytes of RAM" does not.

I hope I'm not dragging off topic here, but this is a really interesting difference between how the x86 and 6502 interact with the user. Yeah, having a 'do a function call' instruction is a time saver, but demands the programmer to keep control of things in a way that x86's instruction doesn't. Some of that is hardware optimization of important tasks on the VLSI side, but on the programming side it tells the programmer not to worry about maximum efficiency in that task and to let the microcode handle it. That's a very different philosophy of programmer/hardware interaction than what produced Solaris.

That's what I mean when I talk about what a work of art this programming is on Solaris. The hardware left a huge amount in the hands of the programmer, and the programmer made advances that are unthinkable now. Like, ponder the difference between early and late PS2 games. Big graphics difference, sure. But compared to the difference between Pong and Solaris? It's a testament to the power of human ingenuity.

ManxomeBromide posted:

The next "real" update should be relatively soon. I got about 20 minutes of captures and converting it into a coherent narrative and introduction of mechanics has been slow going.

Thank you for doing this. Not only is this an awesome in depth LP of an Atari2600 game, it's been a useful tool for me to show people when they ask what a computer engineer is, or when I get funny looks for suggesting there's value in understanding computing in machine terms and not just python or rails patterns.

# ¿ Aug 4, 2016 19:54

The Something Awful Forums > Discussion > Video Games > Let's Play! > Retro-hacking for fun and madness: Let's Vivisect Solaris!