Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
A little experiment:


http://johnearnest.github.io/Octo/index.html?gist=0c7895a9a51f68ae1700

I'm not sure I'll have enough memory to do anything interesting with it, though. 16x16 tiles chew through my budget pretty fast.

Adbot
ADBOT LOVES YOU

Lord Windy
Mar 26, 2010
The Chip-8, could it load data from files or was it simply hand loaded into RAM?

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
On the COSMAC VIP programs were loaded into ram by painful hand-entry or from a cassette tape. If you're asking whether a game could dynamically load extra resources at runtime, it would have been possible by using RCA 1802 machinecode, but Chip8 itself has no facilities for this. When a Chip8 interpreter was written for the HP-48 graphing calculator the "superchip" instructions were introduced, and among them were some instructions that let you alter the HP-48's status flags, which served various purposes. There were 64 of these boolean flags, so you could persist 8 Chip8 registers worth of data. Right now Octo emulates those instructions and persists those 64 bits of data between runs but if anyone wanted I suppose I could stash them longer-term in browser cookies.

Lord Windy
Mar 26, 2010
Well, reading through the documentation there appears to be a command called 0x0nnn which is used by the system to jump to another section in the machine. If I'm not wrong, that is a 20bit address space. There are 8 reserved 0x0nnn instructions, 2 from the original Chip-8 and 6 from Super-48 but any decent compiler can get around that.

I've got a major assignment I've got to do for Uni. I don't really know how to do this, but this seems like an interesting project and I want to help. Do you think we can talk?

EDIT:

And already wrong, it's a 12 bit address. I'm an idiot and forgot how hexadecimal works. But still, the RCA-1802 has a 16bit address bus so I think that is workable along with the 64 flags.

Lord Windy fucked around with this message at 07:52 on Jul 31, 2014

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
Problem being that virtually no Chip8 interpreters, Octo included, actually emulate the RCA 1802, and none that I am aware of go the additional step of emulating the COSMAC VIP peripherals to any meaningful extent. Even if that was done, the VIP didn't have any sort of random-access persistent memory- the cassette interface required a human to hit play, rewind and record. I wouldn't recommend overloading the behavior of the SuperChip flag variables, especially given that I haven't even managed to find example programs which make use of them.

If you wanted to provide Chip8 with your own set of extended instructions (like SuperChip) to expand addressable RAM somehow or add new IO devices, the following gaps exist in instruction ranges:

  • 0x0NNN is reserved for machine code subroutine jumps. Several instructions exist in this range, but there are small usable gaps. Retaining full legacy compatibility will require careful examination of the original Chip8 disassembly to ensure you're branching into memory that couldn't possibly have stored meaningful code. Superchip uses a few of these for toggling high-res mode.
  • 0x5XYN where N is not 0 is unused. 0 in that position maps to "skip if vX == vY", so this might be a good place for new branch instructions.
  • 0x8XYN where N is 8, 9, A, B, C or D. The other instructions in this family are arithmetic ops.
  • 0x9XYN where N is not 0 is unused. Same deal as 0x5XYN, except when N is 0 it's "skip if vX != vY".
  • 0xEXNN. Most of these are available. 9E and A1 are used for the keypad-test conditional skips.
  • 0xFXNN. Again, most of these are available. Used slots include BCD decoding, the hex character set and interacting with timers. SuperChip uses a swath of these for scrolling instructions.

If it sounds like fun, feel free to fork Octo and add your own instructions.

Personally, I'm more interested in exploring the space of interesting programs that fit inside the existing resource limits. As I mentioned earlier when I discussed compatibility modes my research has found that the Chip8 universe is already distressingly balkanized and littered with mutually incompatible interpretations of behavior. The maintainer of Chip8.com has fashioned his own set of extensions to the Chip8 called MegaChip, but there are scant documents, programs or tools for it, and few signs of life.

Internet Janitor fucked around with this message at 23:53 on Jul 31, 2014

PandaCookies
Mar 19, 2009

Delicious endangered confection!
This thread is the best. I am going to do a CHIP8 project soon.

sighnoceros
Mar 11, 2007
:qq: GOONS ARE MEAN :qq:
So IJ convinced me to give CHIP-8 a shot after the SA Game Dev Challenge so here I am. Last night I went through his Octo docs and went through his Intermediate tutorial and it was pretty helpful.

I have had no previous low-level language or memory management experience, pretty much all of my coding experience is in stuff like VB script or C#.

But today I made a little space shooty thing! http://johnearnest.github.io/Octo/index.html?gist=d0c627c9e8183b2f13e8

I initially had just 1 enemy and was storing their info in the basic... registers?... just as an exercise to get it working. Then I had to do a bunch of conversion to get it working with multiple enemies, but now it should theoretically support however many rows I can fit into the enemydata table.

Now I have to do the same treatment for the player bullets so you can shoot more than once and I will have a nice little basic game. I am also aware I am doing stuff really inefficiently so I need to go back and clean up the enemy update logic so I don't have to load and save each enemy's data multiple times per game loop, among other things.

But yeah, a pretty neat little experiment so far!

sighnoceros fucked around with this message at 06:16 on Aug 4, 2014

sighnoceros
Mar 11, 2007
:qq: GOONS ARE MEAN :qq:
I've finished SPACE DEFENSE!



w/s to move up and down, e to fire. You now have to defend against multiple enemies and can have multiple bullets on screen at once.

It was a pretty fun experiment for only taking a couple days, especially having no experience with this type of programming in the past.

Try it out here: http://johnearnest.github.io/Octo/index.html?gist=6a4a180bb8b94c005410

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

sighnoceros posted:

I've finished SPACE DEFENSE!


w/s to move up and down, e to fire. You now have to defend against multiple enemies and can have multiple bullets on screen at once.

It was a pretty fun experiment for only taking a couple days, especially having no experience with this type of programming in the past.

Try it out here: http://johnearnest.github.io/Octo/index.html?gist=6a4a180bb8b94c005410

Pretty good, but could you maybe lengthen the pause after you lose? I barely have enough time to see my score, sometimes I miss it.

JossiRossi
Jul 28, 2008

A little EQ, a touch of reverb, slap on some compression and there. That'll get your dickbutt jiggling.
So I played with Chip8 a bit tonight and with the help of InternetJanitor made this tour de force: http://johnearnest.github.io/Octo/index.html?gist=558d6ea0973105aef180.

Press any key.

Now the code itself looks real nice now but that is because IJ reworked the whole thing in chat as he was walking me through things.

Scaevolus
Apr 16, 2007

I ported Mr Worm to SuperChip8.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
That's really cool, Scaevolus- I particularly like the trick you use with worm-dirtab to load offsets and then have i left at just the right place for the necessary sprite data. Very playable, even at 15 cycles/frame. When I was testing at one point the high score value seemed to display wrong but I can't reproduce it and and I don't see anything obviously wrong in the code.

The Laplace Demon
Jul 23, 2009

"Oh dear! Oh dear! Heisenberg is a douche!"

Internet Janitor posted:

That's really cool, Scaevolus- I particularly like the trick you use with worm-dirtab to load offsets and then have i left at just the right place for the necessary sprite data. Very playable, even at 15 cycles/frame. When I was testing at one point the high score value seemed to display wrong but I can't reproduce it and and I don't see anything obviously wrong in the code.

I'm reproducing the high score thing. Seems to me whenever the score is over 10 but below 100, it glitches out. Moving i := hex v1 below the : nohundred fixes it.

Scaevolus
Apr 16, 2007

Good catch! Updated with that fix.

The hardest part of writing this was the Queue implementation. Initially I was trying to do some insane dynamically growing circular queue structure and couldn't work out the kinks of increasing its size, then I realized I could just have a large fixed-size (256 byte) queue and rely on wrapping behavior.

Debugging Octo programs is pretty miserable. Simple debug printing with basic interpolation would help a lot. A command like:

:log "got $v0 out of queue. [$queuehead, $queuetail]"

Would log a message to an output buffer on the side of the screen as you play the game.

Maybe with a few conversion commands

code:
interpolation: $name[:conversion specifier]
conversion specifiers:
x -- print in hex
b -- print in binary
s -- print as signed decimal
d -- (default) print as unsigned decimal
\d+ -- for memory locations, print multiple bytes
so you could write things like :log "random bit permutation $perm:b" or :log "stack: size:$stacksize $stack:8x"

edit: did a bit more refactoring and cleanup

Scaevolus fucked around with this message at 00:24 on Aug 11, 2014

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
Miserable, eh? That simply will not do.

For starters I've equipped Octo with a register monitor and breakpoints. Pressing the 'i' key will interrupt a running program, 'o' will single-step a paused program and 'i' again will resume execution. You get a register dump like this which uses the compiler's symbol tables to display register aliases, guessed labels and an inferred stack trace:



You can also insert breakpoints in your code with :breakpoint <name> which will automatically pause execution when they are encountered- the name is displayed at the top of the register monitor so you can easily identify which breakpoint you stopped on. Breakpoints do not inject any instructions into the actual program so an exported ROM shouldn't interfere with any other Chip8 interpreter.

I will continue to think about debugging tools and I may add more later such as Scaevolus' suggestion for a debug log or possibly watch points, but hopefully the features I've added already will prove helpful.

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:
That looks pretty awesome IJ. I have 1/2 of a game in the works, but I haven't been able to spend time on it in awhile. Hopefully I can try the new debugging tools out soon.

duck monster
Dec 15, 2004

Internet Janitor posted:

This afternoon I put the finishing touches on a Chip8 adventure game.

http://johnearnest.github.io/Octo/index.html?gist=a1d72c7bbf65520fd20d
Traverse a sprawling 16-screen overworld, solve puzzles, move crates around and, with a little luck, watch an incredible ending sequence! Move with ASWD, and in platformer levels use E to pick up/drop crates and Q to reset the level.


Do you have what it takes to be a Cave Explorer?

If there's interest I could talk about some of the techniques I used to write this game.

Sometimes I wish we could just wipe out 20 years of computer development so we could be at the prime of 8bit garage development and hook up with Rod hubbard to make disco as gently caress soundtracks and do awesome loving cartrige games.

Because this poo poo owns :swoon:

edit: An old technique in rom games to deal with memory shortages was to have a mechanism to swap blocks from a secondary rom bank. So you could have 4k ram fed by a 16k rom swapping in 1k banks on demand. Even as far as the Z80 64k peak, I remember my old Amstrad 6128 had a second processor that could swap in 16k banks of ram on demand (If you used CP/M it appeared to the OS contiguous using ~mAgIc~ , which puzzled the gently caress out of me since the Z80 had no addressing modes that could do that, but I think CP/M threw an abstraction layer over things to make it work somehow.

duck monster fucked around with this message at 10:49 on Aug 13, 2014

sighnoceros
Mar 11, 2007
:qq: GOONS ARE MEAN :qq:

HappyHippo posted:

Pretty good, but could you maybe lengthen the pause after you lose? I barely have enough time to see my score, sometimes I miss it.

I can, and did! http://johnearnest.github.io/Octo/index.html?gist=d9007f1d214891c5e748

sighnoceros
Mar 11, 2007
:qq: GOONS ARE MEAN :qq:
WHAT, my post from earlier disappeared. Curse you, forums!

Sagacity
May 2, 2003
Hopefully my epitaph will be funnier than my custom title.

duck monster posted:

Sometimes I wish we could just wipe out 30 years of computer development so we could be at the prime of 8bit garage development and hook up with Rob hubbard to make disco as gently caress soundtracks and do awesome loving cartrige games.
Fixed :)

Btw, this is not particularly related, but if you enjoy this era there's a nice documentary that was Kickstarted last year that's (hopefully) coming out soon.

Scaevolus
Apr 16, 2007

Internet Janitor asked me to do a petit-postmortem for Mr Worm.

My initial plan was to port one of the many TI-83 or TI-89 games to Chip8. Sighnoceros' shooter suggested Phoenix.

I did some experiments with smooth movement using xor-masked sprites. The trick to moving a sprite smoothly is to xor the sprite in the source and destination positions together. This is pretty easy to do in Paint.net -- make two layers, set the blend mode to 'XOR', draw the sprite in different colors on both of them, and move the layers around.

Here's the Phoenix ship on top, with the xor-mask for vertical movement below it:


The movement was pretty smooth, but porting the whole game was less exciting. Searching for another game that could use the smooth movement, I remembered Mr Worm:

https://www.youtube.com/watch?v=ONzSyj1HeFk

Snake, with 12 possible directions (the "o"s in the diagram, "x" is the center)
code:
 ooo
o   o
o x o
o   o
 ooo
Milestone 1: get the sprites and basic drawing/movement code done. This snake just wanders randomly until it hits itself.

Endless version


Milestone 2: get the tail working. The following logic is very wrong, so the game turns into a short lightcycle game with a bad AI where you always lose. The plan is a circular queue of past snake directions, so to erase the tail there's a 'second snake' that just duplicates the draw calls that the head made so many steps ago. In this version, the circular queue is of variable size, and there's a significant amount of complexity managing the concepts of queuelen/queuecap/queuehead/queuetail.

This took a while to debug:



Better:


Working


Milestone 3: a snake that grows when it eats things.

This is where things got hairy. The circular queue code worked, but it was written to only use 'queuecap' bytes -- the current maximum length of the tail.

The queue might be like this (where 6 is the last head position, and 0 is the tail that's following):
code:
4 5 6 0 1 2 3
Growing it, the queue needs to become something like:
code:
4 5 6 _ 0 1 2 3
(or)
0 1 2 3 4 5 6 _
This is annoying-- offsets needs to be juggled and memory copied, and Chip8 makes it harder. After banging my head against the wall for a few hours, I realized that the entire problem was irrelevant: there's no reason to constrain the queue to a minimal size in RAM. Once I realized this, I made the circular queue have a maximum size of 256 bytes. This eliminated the wraparound checks for queuehead/queuetail (since they'll overflow and wrap around naturally), and made growing the queue trivial.

Milestone 4: title, score.

Compared to the nightmare of attempting to get a dynamic queue working with no debugger, implementing the last few features to polish off the game was easy.

Scaevolus fucked around with this message at 06:41 on Aug 16, 2014

Scaevolus
Apr 16, 2007

I got my Octo disassembler working on the 157 Chip8 binaries I found.

Here's BLINKY, a Pacman clone with very strange controls (A,S,3,E).

duck monster
Dec 15, 2004


shutup shutup shutup im still 30 lalala i cant hear you.

:suicide:

(I turn 40 next week and I'm totally having a loving crisis over the fact. wheeeeeeeeeee)

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
Scaevolus: The "endless" version of your first milestone is mesmerizing. I wonder if you could use something similar to generate random cave systems or something.

While I was on vacation last week I spent some time tinkering with my own disassembler written in Javascript. Debugging it has not been a particularly fun exercise, but my labors have paid off:







This replaces the previous functionality of the "Load Binary" button. The static analysis I perform uses a technique I'm calling "value smearing". I walk the program graph propagating every value that every register could assume and simulating the postconditions of every instruction until I reach a fixed point. Some postconditions are simulated using a "loose-bound"- for example, when I encounter a vx := key instruction I assume that any key could have been pressed. As you might imagine, this is fairly expensive and only feasible due to the limited memory and small registers of the Chip8. As I say in the disclaimer, it's still slow and flaky, but for many programs it manages to come up with something sensible. Consider it a work in progress and don't be surprised if it screws up.

One interesting thing I've discovered so far is that many programs (such as Brix.ch8) seem to pad their sprite data with an extra 0x00 that is never accessed. I'm not sure what to make of this- perhaps some chip8 implementations drew n+1 rows for a sprite of size n, or maybe some early chip8 assemblers attempted to keep data in 2-word aligned chunks?

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
In other news, while prowling for new Chip8 roms online I found this.

Using my disassembler I analyzed the program and gradually teased it apart. As it happens, the emulator has some bugs which are masked by complementary bugs in the game. I managed to get it working in Octo by modifying two bytes and along the way I identified a few other possible issues. It was a bit frustrating to reverse-engineer but along the way I made some improvements to my disassembler. Here's the cleaned-up Octo source:



http://johnearnest.github.io/Octo/index.html?gist=35e180cece89ed4d918f

Pressing E "flaps". I shouldn't have to explain anything else.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
This morning I put together a CLI frontend for Octo using Node.js:
code:
ij@red$ ./octo
usage: octo [--decompile] [--roundtrip] [--qshift] [--qloadstore] <source> [<destination>]

ij@red$ cat simple.8o
: main
	va := 1
	vb := 2

ij@red$ ./octo simple.8o simple.ch8
ij@red$ hexdump simple.ch8
0000000 6a 01 6b 02                                    
0000004
This should make testing easier and additionally make it more convenient to use Octo in the development of other emulators.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
I've been thinking about this for a while and I've come to a decision- I've deprecated :proto.

Originally, Octo required an explicit prototype declaration before the use of any unstructured forward reference. The idea behind this was that programs could be read in order and understood without jumping around. In many situations you can adhere to this convention without creating any overhead, but occasionally a forward reference is the most efficient way to program something. By making the "reading order" approach the path of least resistance and making forward references less convenient I thought it would guide users toward a good style.

Having used the language extensively and worked with a number of beginners one-on-one I think this approach is ultimately just confusing and clunky for the vast majority of the people who encounter it. Very few existing languages have restrictions like this. Consider it a failed experiment.

Existing programs which use :proto will still work, but the statement no longer has any effect. You can now freely reference any label from anywhere in the program without making declarations first just like an ordinary assembler. I've updated all the documentation and examples.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
I have been slowly working to improve the quality of Octo's disassembler and sifting through my corpus of binaries. A number of very old Chip8 programs contain RCA 1802 machine code embedded in them, so to better understand these programs I have added a crude prettyprinter for this instruction encoding. Here's an excerpt from disassembled output of a "clock program" to demonstrate:

code:
...
: sub-0
	v8 := 7
	i := label-9
	sprite v7 v8 7
	v7 += 1
	return

: machine-1
	0xF8 0xFA      # 0x2D8 : LDI  0xFA
	0xAF           # 0x2DA : PLO  F
	0x2F           # 0x2DB : DEC  F
	0x8F           # 0x2DC : GLO  F
	0x3A 0xDB      # 0x2DD : BNZ  0xDB
	0xD4           # 0x2DF : SEP  4

: label-9
	0xFC
	0xFC
	0xFC
...
Better than staring at raw hex bytes, at least! This chunk of code appears to be a timing loop. It loads the value 0xFA into a register loops, decrementing this register until it is zero. The SEP instruction performs a context switch back into the Chip8 program.

Octo is not presently capable of executing a program like this, but the compiler and decompiler can now be used for studying them. If anyone else is interested in reverse-engineering I invite you to poke around with some of the known "hybrid" binaries and see what you can find. The memory and register maps provided in the COSMAC VIP Manual and this primer on RCA 1802 assembly language should be useful.

Internet Janitor fucked around with this message at 19:21 on Aug 27, 2014

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."

http://johnearnest.github.io/Octo/index.html?gist=6e934bfd418bd2efeb16

I guess I'm not feeling very creative today.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
Time for another effortpost, even if I am just talking to myself.

Sorting is one of the most fundamental classes of bread-and-butter algorithms in Computer Science. Implementing a sorting algorithm on the Chip8 has some interesting challenges. Registers are plentiful, Memory access is cumbersome and recursion is impractical. The Big-O complexity of algorithms tells us how well they should perform for some arbitrary size N, but once we tie ourselves to a specific ISA and an upper limit on practical Ns the story can change. Let's look at a few approaches.

Performance comparisons will be based on a common, arbitrarily chosen shuffled array- in practice different inputs will produce different results but one trial gives us a coarse idea of how algorithms stack up:

code:
:const SIZE   16
:const SIZE-1 15
: data  14 5 15 6 1 3 10 7 0 9 11 4 2 13 8 12
The O(n^2) sorting algorithms are often very compact and don't have much overhead. I went with a selection sort for starters, because the linear scans it involves lend themselves well to Chip8 memory operations:

code:
:alias  here       v1
:alias  rest       v2
:alias  min-index  v3
:alias  min-value  v4
:alias  here-value v5

: selection-sort
	here := 0
	loop
		min-index := here
		i := data
		i += here
		load v0
		min-value  := v0
		here-value := v0

		rest := here
		rest += 1
		i := data
		i += rest
		loop
			load v0
			if v0 >= min-value then jump no-better
				min-index := rest
				min-value := v0
			: no-better

			rest += 1
			if rest != SIZE then
		again

		if min-index == here then jump no-swap
			v0 := here-value
			i := data
			i += min-index
			save v0

			v0 := min-value
			i := data
			i += here
			save v0
		: no-swap

		here += 1
		if here != SIZE-1 then
	again
;
I use a few tricks here. I access the pivot value for each pass up-front and carry it in registers. The loop which finds the minimum value takes advantage of how load increments i, removing the need to re-initialize it and add an offset on each loop iteration. The memory swap at the end of each iteration is fairly bulky, even though I've carefully arranged so that both values will already be in registers.

The algorithm weighs in at 70 bytes and takes roughly 1260 cycles to sort our 16 element array. It'll be hard to write a general-purpose sorting routine that consumes less memory, but we can improve the speed.

The next natural thought is to employ one of the O(n * lg(n)) algorithms. Quicksort seems to be right out because it is naturally implemented recursively and Chip8 doesn't have an argument stack. We could build one, but the overhead doesn't sound good. A better choice is Heapsort:

code:
:const LIMIT   7 # (SIZE/2)+1

:alias left-val  v0
:alias right-val v1
:alias start     v2
:alias root      v3
:alias end       v4
:alias best      v5
:alias left      v6
:alias best-val  v7
:alias root-val  v8

: heap-sort
	start := LIMIT
	end   := SIZE
	loop
		root := start
		sift-down
		start += -1
		if start != -1 then
	again

	start := SIZE-1
	loop
		# swap data[0] with data[start]:
		i := data
		load v0
		vf := v0

		i := data
		i += start
		load v0
		i := data
		save v0

		i := data
		i += start
		v0 := vf
		save v0

		start += -1
		root := 0
		end := start
		sift-down

		if start != 0 then
	again
;

: assign-best
	best     := left
	best-val := left-val
	jump found-best

: sift-down
	i := data
	i += root
	load v0
	root-val := v0

	loop
		left <<= root

		if left > end then return

		i := data
		i += left
		load v1

		best := left
		best += 1
		best-val := right-val

		if left-val > right-val then jump assign-best
		if left == end then jump assign-best
		: found-best

		if root-val >= best-val then return

		i := data
		i += root
		v0 := best-val
		save v0

		i := data
		i += best
		v0 := root-val
		save v0

		root := best
	again
This one was a little tricky to get debugged. Unfortunately, it's worse on both dimensions we're considering. The code takes up 130 bytes and takes 1941 cycles to sort the same 16 element array.

If we scale up the array to 64 elements the insertion sort takes about 17613 cycles while the heapsort is only about 11825. Algorithmic complexities work out as expected, but either approach is impractically slow.

There's a different way to approach this problem. If we have a small, fixed N we could aggressively unroll the steps in a normal sorting algorithm, producing a sequence of code snippets which look something like this:

code:
if v0 <= v1 then jump l-0
	vf := v0
	v0 := v1
	v1 := vf
: l-0
By using a load vf instruction it would be possible to pull the full 16 elements of an array into registers at once, and then this lattice of comparisons and swaps could take care of the rest. That won't quite work, though- comparing the magnitude of two values requires a subtraction, which will destroy the contents of vf as well as one other temporary register. As a result we'd only be able to sort 14 elements in this manner. The next lowest power of two is 8.

What is the best sequence of comparisons and swaps? The obvious approach is to unroll the steps of a Bubble Sort- for size N=8 that would mean 28 swaps. This is a loose bound, though- this site describes optimal sorting networks for N <= 16. Here's one for N=8 that only requires 19 swaps:



This diagram is read left-to-right. Horizontal lines are values in a given position of an array, and vertical lines are a test-and-swap between two values. Vertical lines which are in the same column are swaps that can be carried out simultaneously or in any order and otherwise the test-and-swaps must be carried out left to right. Translating this into Octo code as in the example above, we get a subroutine called sort-8 which is 268 bytes long and takes somewhere around 100 cycles to do its thing. This is fast enough that it could actually be employed in a Chip8 game if called sparingly!

For a fair comparison with the previous two algorithms, we can sort 16 elements by splitting the source array into two halves, using the sorting network on each and then performing a linear merge pass:

code:
: heap1 0 0 0 0 0 0 0 0
: heap2 0 0 0 0 0 0 0 0

:alias val1   v8
:alias val2   v9
:alias dest   va
:alias index1 vb
:alias index2 vc

: fused-sort
	i := data
	load v7
	sort-8
	val1 := v0
	i := heap1
	save v7

	i := data
	load v7
	load v7 # cheaper than adding an offset of 8
	sort-8
	val2 := v0
	i := heap2
	save v7

	index1 := 1
	index2 := 1
	dest   := 0

: merge
	if val1 > val2 then jump merge-2
	append-1
	if index1 != 9 then jump merge
	loop
		append-2
		if dest == 16 then return
	again

: merge-2
	append-2
	if index2 != 9 then jump merge
	loop
		append-1
		if dest == 16 then return
	again

: append-1
	v0 := val1
	i := data
	i += dest
	save v0
	dest += 1

	i := heap1
	i += index1
	load v0
	val1 := v0
	index1 += 1
;

: append-2
	v0 := val2
	i := data
	i += dest
	save v0
	dest += 1

	i := heap2
	i += index2
	load v0
	val2 := v0
	index2 += 1
;
This could be done in-place but I used separate buffers for the sake of conceptual simplicty. The results are impressive: 418 bytes of code and a finished sort in about 479 cycles, completely blowing our previous attempts out of the water at the cost of some precious ram. If this is the best balance we can strike, I hope nobody needs to write a game that requires a routine like this, though- if we use what I believe is a historically accurate clock speed for Chip8 our fastest algorithm takes a little over a second to run.

Anyone think you can make a smaller or more efficient sorting routine? See any optimizations I missed?

DeathBySpoon
Dec 17, 2007

I got myself a paper clip!
I love reading this thread even if I don't have time to contribute. Maybe I'll try to hammer together a Chip8 game next time I join a jam. Just wanted to pop in and say I enjoy your updates IJ, keep on talking to yourself.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
Today I cleaned up a few rough utility programs I've used for tinkering with Octo and put them on github. Perhaps they will prove useful to others.

Smoothie is a utility for building flicker-free XOR-masked sprite sheets. Given a desired animation sequence it can build a sequence of looping deltas:

->
You can also use Smoothie as an easy bulk-importer if you disable the xor-masking feature.

TextPack is a utility for creating fonts and encoding strings. I showed a cruder, earlier version of this a while back:



ImagePack is a utility for preparing large images for Octo. You can specify a variety of ways to slice and dice data so that you can inexpensively achieve various wipe and pan effects and speeds:


All of these are commandline Java programs.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."

http://johnearnest.github.io/Octo/index.html?gist=328ed416559b8cba1d0c

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

DeathBySpoon posted:

I love reading this thread even if I don't have time to contribute. Maybe I'll try to hammer together a Chip8 game next time I join a jam. Just wanted to pop in and say I enjoy your updates IJ, keep on talking to yourself.
Likewise, IJ, this is my favourite CoC thread of the moment.

Every now and then I hack a bit on homebrew Atari VCS, so it's awesome to see what you're doing with another machine from the era.

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
DeathBySpoon, Dijkstracula: Thank you. It means a lot to me to hear this sentiment about my work.

Contrasting Chip8 with the VCS is interesting. Chip8 has severe constraints, but writing programs for it (especially given fairly modern tools) feels pretty high level to me. I don't have to manage memory banks or the relative lengths of branch instructions. Outside benchmarks as in my sorting algorithm showdown I never have to count cycles or really care about the addresses where my code is being assembled and laid out. There's only one "addressing mode". Coding for the VCS seems extremely daunting in comparison- complex addressing modes and memory layouts are key to getting things done quickly and even the simplest program has to concern itself with cycle-counts to "race the beam". I am a bit envious of the color and sound the VCS is capable of, though. (I may yet succumb to the temptation to make a few novel additions to the Chip8 instruction set.)

As for development news the Octo web UI is in the process of getting a bit of a facelift. (If I break anything please bring it to my attention.) While there hasn't been any particular need for it up to this point there is a :org operative in the language now for assembling code and data to fixed addresses which I intend to use for some dumb tricks in the future.

The disassembler is doing a much better job with some problem-child binaries thanks to bugfixes and improved heuristics, but it's still rather slow. One intriguing addition to my menagerie is Pinball, an RCA 1802 hybrid program. It isn't completely disassembled but you can see a number of chunks of machinecode Octo was able to identify and parse. I suspect that the "bad opcodes" immediately following native calls are arguments for the subroutines which are consumed by advancing the Chip8 program counter. In the future I might be able to do some crude static analysis of the machine code to better tease apart the remaining unknown territories of binaries like this.

In the course of my reverse-engineering adventures I have also been bemused to discover that some of the "test programs" for SuperChip features floating around the internet are dead wrong (about things that don't even relate to SuperChip instructions!) If you want to have some fun, see how many mistakes you can find in this Scroll Test program. Octo has flagged unreachable code and data as "unused", but with a bit of head-scratching it's easy to work out what the programmer meant to do. Sometimes when I take things apart I find very clever code but just as often I seem to discover bizarrely inefficient or convoluted code that seems to serve no purpose. The best guess I have is that these weird parts come from ad-hoc patching of binaries after their original assembly, but we may never know for sure.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

yeah, likewise, I'm pretty envious of your being able to write surprisingly-high level code for the Chip8, though chasing the beam is half the fun of the VCS :)

Certainly, the most infuriating part of the VCS is the variable number of cycles certain reads take depending on whether the source address is on the zero page or not. I've been bitten by not paying close enough attention to where my ROM's high water address mark is and have everything go to poo poo if I have to read past 00FFh :v:

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
I'm speaking with only second-hand VCS programming experience here, but how much do you think it would help to have an assembler that could check timing constraints for you? Imagine being able to have a block that looks like

code:
scanline {
	// 6502 code goes here
}
The assembler could know how long a scanline is (in cycles), so if you're under it'll introduce the necessary padding and if you exceed that limit you get a compile-time error. Making a feature like this compose with arbitrary subroutine calls and loops might get tricky and require some static analysis or annotations but I imagine it could remove an immense amount of tedium.

In other news, Octo now offers an "examples" menu to make it easier for beginners to see what it's capable of without investing too much effort digging around:

This list is populated from the examples directory in the github repository, so if anyone has a program they want added feel free to submit a pull request.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

Internet Janitor posted:

I'm speaking with only second-hand VCS programming experience here, but how much do you think it would help to have an assembler that could check timing constraints for you?
I've actually written a basic (incomplete and likely buggy) one :) The issues with it are the ones you've described - subroutines and even branching means a particular instruction can be executed at multiple points in the scanline (which is actually superuseful but points to the need for better visualization for the programmer than tacking on a "; cycle count: %d ..." comment on each line :)

I should probably clean it up and throw it up on my public repo sometime.

Jato
Dec 21, 2009


Your Octo emulator/IDE is awesome IJ, and I love the things everybody is doing with it. I'm working on a Chip8 emulator in python to learn a bit about emulation. Once I get this finished I think I'll play around with Octo and try making a game or two.

The current state of my semi-functional Chip8 emulator:

Adbot
ADBOT LOVES YOU

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."
AtrociousToaster: That looks like a pretty good start! I can say from experience that using Octo to write little testing programs makes tracking down emulation bugs a LOT easier than trying to work them out using existing roms alone. If you install Node.js and check out the Octo repository there's a CLI frontend available for the compiler and decompiler. Keep us updated!

  • Locked thread