|
Lately, I've been interested in learning how to work with an assembly language. Maybe because of my interest in the history of computing and computer games, or just because I want to learn how things work at a lower level. I have an Arduino and feel like it would be fun to work with that, but I'm also intrigued by doing something with the C64 or NES. I have a computer science minor and though my later education and career have been in writing and humanities, I've retained (and used) a small amount of programming knowledge over the years. I learned Java and C++ as an undergrad, and like to think I'm still marginally adequate at writing or reading these languages. However there seems to be a pretty big jump between understanding Java and understanding assembly. I can for example read about two's complement notation or arithmetic shifts and understand them in abstract, but if I read it in source code I'm puzzled as to why it's being used or where it's going. Instruction set documentation is a bit overwhelming, too. With that in mind, I'm hoping to get advice on how I might go about learning assembly. It seems reasonable to learn with architecture applicable to my interests, so either the 6502/6510 or the (mega)AVR. My other priority is to be able to compile and run code easily and frequently, due to what I think of as assembly's "very broad. very granular" syntax. If this belongs in another thread, I'll post there too. Thanks!
|
# ? Aug 22, 2014 22:48 |
|
|
# ? Apr 19, 2024 19:43 |
|
I've read around that C++ compilers are so good that they'll usually beat handwritten assembly code. I've written a bit of assembly, most specifically in my computer architecture class. I like C++, but assembly was just super annoying to work with. With that being said, if your inclinations to learn assembly are purely academic, I recommend a computer architecture book as a starting point. You can also look at the assembly generated by C++ programs to get an idea on how to implement certain things.
|
# ? Aug 22, 2014 23:00 |
|
This may be a good place to start: http://flatassembler.net/ Isn't assembly still useful for things like injecting values into software/memory manipulation? I remember wanting to learn it when I was into game hacks.
|
# ? Aug 22, 2014 23:05 |
|
Knyteguy posted:This may be a good place to start: http://flatassembler.net/ Oh true. Though, I wonder if there exist tools that are a little more high level to accomplish that. I'd wager there's a good chance there are.
|
# ? Aug 22, 2014 23:19 |
|
The best way to learn assembly is to write assembly. Write a program in C then get your compiler to spit out assembly from the C code, and figure out how to compile that. Then just start experimenting! I think your idea of staring with a simple architecture like the AVR or 6502 is a really good idea, because like others have said you'll need to know the ins-and-outs of the architecture very well. If you start with a really complicated machine you'll probably get overwhelmed quickly, and it will be difficult to remember why you're doing it in the first place. If, however, you choose to learn something like PIC16 assembly, you'll easily be able to write ASM that's much more efficient than the compiler
|
# ? Aug 23, 2014 01:12 |
|
Back when I learned assembly it was on a Motorola 68k, which is I guess fine if you want to program an Amiga or something but otherwise not very relevant to the modern world. Assembly is very much like higher level languages when you're writing straighforward procedural code, it's just operating at a different scale. Registers feel like RAM, RAM feels like disk, etc. I'm on board with you starting on something simple but as with all languages if you're going to bother learning assembly it should be in order to achieve an actual programming goal.
|
# ? Aug 23, 2014 11:58 |
|
I did a little assembly in school and even a little macro expansion, but only wrote the most trivial poo poo myself. How much of assembly writing is just calling macros?
|
# ? Aug 24, 2014 05:31 |
|
Macros in assembly programming are kindof like functions in higher-level programming: they can be a nice way to avoid repeating common code, but it's not like making everything a function solves all your problems for you. I would actually caution you away from starting with a fake/simplified/toy instruction set. First, the major instruction sets are actually not massively more complicated in how they do simple tasks; at worst, you will see instruction operands that involve accessing memory, rather than having that split into a separate instruction. More importantly, though, working with a real instruction set gets you closer to the thing that's actually valuable about learning assembly programming, which is understanding how real hardware works. I think the best approach is probably to write small C functions, compile them to (unoptimized) assembly, and try to understand how the result actually implements the function you wrote. Start with a function that doesn't have parameters, return values, or calls; then add those things once you understand the basics (and once you've found the Wikipedia article about calling conventions for your platform). rjmccall fucked around with this message at 07:32 on Aug 24, 2014 |
# ? Aug 24, 2014 07:29 |
|
Mystery Machine posted:I've read around that C++ compilers are so good that they'll usually beat handwritten assembly code. I've written a bit of assembly, most specifically in my computer architecture class. I like C++, but assembly was just super annoying to work with. With that being said, if your inclinations to learn assembly are purely academic, I recommend a computer architecture book as a starting point. You can also look at the assembly generated by C++ programs to get an idea on how to implement certain things. OP, I would start with disassembling some really simple C programs on x86 with objdump -D -x (if you can understand x86 you can understand pretty much anything, and the Intel docs for the ISA are very, very good). this is the Holy Bible of x86. chapter 5 will tell you what each instruction means, chapter 6 contains deep within its bowels an explanation of why many of the instructions you're seeing are there in the first place--understanding the calling convention is extremely important to making sense of how functions fit together. also, get really, really comfy with gdb--stepi and display/i $pc are your new best friends. (note: I do a _lot_ of work with assembly.)
|
# ? Aug 24, 2014 07:53 |
|
Since you just want to learn about assembly language programming and you don't have a specific use in mind for it, take a look at the LC-3 (http://en.wikipedia.org/wiki/LC-3). It is designed for teaching students about computer architecture, including not only assembly language, but also function calling conventions and handing devices with memory-mapped I/O. Since the LC-3 runs in a simulator that supports step-through debugging, you'll have a much easier time troubleshooting issues than you would trying to debug x86 stuff. I know there's a book that was written to work with this (http://highered.mheducation.com/sites/0072467509/index.html). However, there seem to be a good number of online resources with which you can work.
|
# ? Aug 24, 2014 18:06 |
|
How do you write assembly for the likes of a JVM or CLI? Is the GC a function call or exposed in opcodes?
|
# ? Aug 25, 2014 00:49 |
|
gently caress them posted:How do you write assembly for the likes of a JVM or CLI? Is the GC a function call or exposed in opcodes?
|
# ? Aug 25, 2014 00:59 |
|
The GC just happens. The design of JVM/CLR opcodes gives the VM enough type information to support a fully type-accurate GC. It's hard to answer the question with any more fidelity than that without writing a massive effort post.
|
# ? Aug 25, 2014 01:02 |
|
gently caress them posted:How do you write assembly for the likes of a JVM or CLI? Is the GC a function call or exposed in opcodes? IIRC, the JVM spec doesn't even specify how the garbage collection has to happen. It just says that if you're implementing the JVM, it has to do garbage collection. So if you're asking how the garbage collector of the JVM works, you first to answer, "which implementation of the JVM?"
|
# ? Aug 25, 2014 01:16 |
|
I'm *guessing* that his question actually was whether memory allocation is handled through a separate, special opcode or by just calling a built-in function like "new," which I don't know the answer to. This is assuming local variables/stack frames are handled as, well, a stack instead of garbage-collected heap memory.
|
# ? Aug 26, 2014 15:09 |
|
Thanks, everyone, for the advice. Rest assured I'm taking it into consideration. I've been playing around with Internet Janitor's wonderful CHIP-8 Interpreter, which is helping to familiarize me with bit operations, registers, and constraints of memory and instruction set, all of which is new to me. I would take Professor Science and Barnyard Protein's advice, but I don't know whether I'll end up doing much in x86; I think I'd use a higher level language for anything I might develop on a modern PC. With that said, it sounds like a good idea to start with something I wrote in C. I'm afraid I don't know how to do this on my PC for another given architecture. I imagine I would write the C code, compile for AVR, and disassemble. Do I have that right, and would the same go for any given architecture?
|
# ? Aug 27, 2014 17:28 |
|
rjmccall posted:Macros in assembly programming are kindof like functions in higher-level programming: they can be a nice way to avoid repeating common code, but it's not like making everything a function solves all your problems for you. The way I see it, the point of writing assembly for "toy" architectures is to learn how to learn an architecture, and to get accustomed to thinking in assembly without the architecture itself being a boatanchor. Also, for fun. If the OP's primary driving purpose (rather than just being A putpose) were understanding a particular architecture he presumably would have said so.
|
# ? Aug 27, 2014 18:10 |
|
Relevant to my interests I'm getting a couple of books on PDP-11 assembly to teach myself how to assemble for that architecture. I've been running SIMH unixes for a couple of years and I've got the bug to recreate a couple of the games whose source never made it to any distribution. The PDP-11 is a good architecture to try because its processor is simple and it has many handy addressing modes. Later PDP-11s had floating-point registers and a "commercial instruction set" mainly used for special things that FORTRAN and COBOL needed. And the byproduct is that it will help me to understand the old Unix code better. Of course everything's in octal, but that's not too difficult a wrinkle.
|
# ? Sep 22, 2014 14:43 |
|
rjmccall posted:Macros in assembly programming are kindof like functions in higher-level programming: they can be a nice way to avoid repeating common code, but it's not like making everything a function solves all your problems for you. Kinda. Subroutines are more like functions. Most of the macros in higher level programming that I've been looking at lately, linux kernel bootsector header stuff, are put in there to avoid the subroutine/function call overhead. Like, putting the current registers away, pulling the arguments off of the stack or whatever. The macros I've been looking at seem to turn things that look like function calls, like: code:
code:
ewe2 posted:Relevant to my interests I'm getting a couple of books on PDP-11 assembly to teach myself how to assemble for that architecture. I've been running SIMH unixes for a couple of years and I've got the bug to recreate a couple of the games whose source never made it to any distribution. These links, etc. should be relevant to your interests. edit: on second thought, you probably already know of them http://pdos.csail.mit.edu/6.828/2011/xv6.html code:
Cyberpunkey Monkey fucked around with this message at 05:49 on Sep 26, 2014 |
# ? Sep 26, 2014 05:43 |
|
osirisisdead posted:Kinda. Subroutines are more like functions. Most of the macros in higher level programming that I've been looking at lately, linux kernel bootsector header stuff, are put in there to avoid the subroutine/function call overhead. Like, putting the current registers away, pulling the arguments off of the stack or whatever. I was making a point about the role of macros in assembly programming, not saying that macros are actually technically similar to functions in the end result. Much like functions in higher-level languages, macros in assembly are useful when there's a common operation you need to perform. Like a function, using a macro has the advantage of allowing you to implement this operation in one place, which is especially helpful when (like a lot of the assembly programs I've written and seen) the instruction sequence needs to be slightly different in different build configurations. Of course, unlike a function, there are no universal (per platform) conventions to help you separate concerns, and a well-written assembly program needs to be carefully documenting what registers a particular macro is allowed to clobber and what exactly it expects of and does to the stack; you can think of this as basically being a custom "calling convention" for every macro. And since these are macros, often these conventions are parameterized so that you can tell each invocation of the macro to use a different scratch register based on the needs of the "caller". Macros in C are a different story. The legitimate reasons to use macros in C are when you can't do it any other way, like making min work for arbitrary argument types without C++-like function overloading, or making a function like assert that doesn't always evaluate its argument, or doing something that requires creating a new local variable in the "caller". Call overhead, however, is not a legitimate reason, for three reasons. The first is that call overhead is really not that significant anymore; modern processors are very good at executing simple call/return sequences. The second is that obviously profitable things like min will definitely be inlined if you give the compiler a chance. The third is that every respectable compiler provides some way to force a function to be inlined regardless of whether the compiler thinks it'll be profitable.
|
# ? Sep 26, 2014 06:58 |
|
Not saying that your argument isn't totally tractable, but my intent was more to open up a discussion about subroutines versus inline and have the other people reading the thread be able to compile a min function in C versus a ternary operator line and maybe gain some insight into the nuts and bolts of C function calls as implemented, to share a neat low-level thingy I recently saw. Learning to type assembly into an editor and then run it through fasm to get an ELF binary is not the important part of learning assembly. The goal was not to have a pedantic argument that was way outside the scope of a "How does a Beginner learn to Work in Assembly Thread?" To reiterate, the goal was not to attack your ego. Thanks for the insight into how macros resolve type issues. That's a cool datum. Cyberpunkey Monkey fucked around with this message at 15:16 on Sep 26, 2014 |
# ? Sep 26, 2014 15:10 |
|
Cool. I was just talking about how macros are used in assembly programming because someone asked about that. I certainly didn't mean to derail you.
|
# ? Sep 26, 2014 19:12 |
|
Resurrecting this thread because I had a similar question recently and was pointed at some excellent resources. This is a two part training thing that is quite good. Actually, most uf the stuff on Open Security Training has been pretty good so far, but I've only gone through a few things.
|
# ? Oct 27, 2014 23:48 |
|
This might be a bit too ground-level for the OP but for anyone where assembly puts you in full-retard mode, "Programming from the Ground Up" is an amazing book that will finally make assembly click for you. It's published under GNU so I think I can link it here: http://gnu.mirrors.pair.com/savannah/savannah//pgubook/ProgrammingGroundUp-0-8.pdf
|
# ? Oct 28, 2014 07:54 |
|
I had a lot of trouble getting started with Assembly on Macs, if there is interested I can go through how the first few tutorials from the 'Programming from the ground up' book for 64bit Macs.
|
# ? Nov 12, 2014 10:49 |
|
Coursera has a Hardware/Software Interface course that might be good background material for anyone who wants to dive into lower level programming. It's not running right now, but you might still be able to access the videos.
|
# ? Nov 16, 2014 07:38 |
|
Hennessy and Patterson is the standard textbook for any undergraduate comp org course. It also serves as a pretty good introduction to (MIPS) assembly programming. The concepts are more valuable than learning any particular instruction set.
|
# ? Nov 16, 2014 10:42 |
|
There is, of course, Nand 2 Tetris which widens the scope a bit. It starts you off with a hardware simulator, and you build a computer iteratively. From gates, to an ALU, to machine code, to assembly, to a VM, to a Java-like language, and then, finally, to Tetris. It won't make you an assembly ninja, but it will give you good computer fundamentals which I assume is the point of the exercise.justbread posted:Hennessy and Patterson is the standard textbook for any undergraduate comp org course. It also serves as a pretty good introduction to (MIPS) assembly programming. The concepts are more valuable than learning any particular instruction set. MIPS is pretty easy, being a RISC architecture, and it's used in the N64 and PS2. (Though I will warn you that even after learning MIPS, you have to deal with some platform specific stuff when it comes to graphics and audio calls and such on those platforms). Linear Zoetrope fucked around with this message at 08:55 on Nov 18, 2014 |
# ? Nov 18, 2014 08:48 |
Well, I just turned in my take-home final for my SPARC assembly class. I think the part that pisses me off the most is that for whatever reason, I could never seem to get logged in to my school's Sparc server so that I could actually try to debug and poo poo. And of course you can't emulate Sparc architecture, so it's not like I could just throw it in a VM. Also annoying, but Google failed me on trying to figure out how to refer to the floating point registers in binary. I just guessed and went with %f0 = 00000. Normally wouldn't be a problem, but everything in the final was floating point, and we had to convert two subroutines' instructions into hex. Mainly just came in here to gripe now that I'm done spending 5 hours on this final and can relax a bit.
|
|
# ? Dec 6, 2014 03:10 |
|
taiyoko posted:Well, I just turned in my take-home final for my SPARC assembly class. I think the part that pisses me off the most is that for whatever reason, I could never seem to get logged in to my school's Sparc server so that I could actually try to debug and poo poo. And of course you can't emulate Sparc architecture, so it's not like I could just throw it in a VM. QEMU will do SPARC just fine, though?
|
# ? Dec 6, 2014 04:35 |
Sinestro posted:QEMU will do SPARC just fine, though? It will? Well, poo poo, google failed me. Too late now, though, final is already submitted.
|
|
# ? Dec 6, 2014 04:43 |
|
I am just now finishing the assembly generation step for the project in my compiler design class. Once I got on a roll the assembly became a lot less intimidating, although it's made significantly easier by the fact that we treating MIPS as a stack machine and using at maximum 2 temp registers at any time. I think if we had to make optimal use of the available registers I'd be lost almost immediately, although the lectures make it sound like that's actually pretty advanced stuff and not really in the scope of a basic compilers class. More something for the real compiler writers to worry about.
|
# ? Dec 6, 2014 08:08 |
|
LeftistMuslimObama posted:I am just now finishing the assembly generation step for the project in my compiler design class. Once I got on a roll the assembly became a lot less intimidating, although it's made significantly easier by the fact that we treating MIPS as a stack machine and using at maximum 2 temp registers at any time. I think if we had to make optimal use of the available registers I'd be lost almost immediately, although the lectures make it sound like that's actually pretty advanced stuff and not really in the scope of a basic compilers class. More something for the real compiler writers to worry about. There are reasons why GCC is still getting active releases. "Optimization", whether that be in-lining, register use, or whatever else is difficult. E: Look up Live Variable Analysis for a small, but still introductory taste of the sort of stuff memory and register optimization can entail. It's the sort of thing you may be expected to do in a grad-level compilers course. (It's also why decompilers will often have fewer variables than the compiled code, assuming the compiled code was stripped of debug symbols. Variables with entirely disjoint live ranges often can't be distinguished after optimization). The basic idea is that if you have a function: code:
Linear Zoetrope fucked around with this message at 08:32 on Dec 6, 2014 |
# ? Dec 6, 2014 08:18 |
|
Jsor posted:Variables with entirely disjoint live ranges often can't be distinguished after optimization). What makes this happen? Does optimization tend to produce code such that not every code path wipes out a variable where it could be used, according to a conservative analysis?
|
# ? Dec 6, 2014 23:14 |
|
sarehu posted:What makes this happen? Does optimization tend to produce code such that not every code path wipes out a variable where it could be used, according to a conservative analysis? The compiler can order the allocation of local variables however it wants - it isn't forced to allocate at the point where the variable is declared, or even at all (a trivial example being not allocating an unused local) - it only needs to allocate a local the first time it is used.
|
# ? Dec 7, 2014 01:48 |
|
Where's a good place to read up on bit-based arithmetic? Specifically I don't get the whole two's-complement thing. I saw some mentions that MIPS includes an addi op but not a subi op because you can turn addi into a subi using a two's complement operation.
|
# ? Dec 7, 2014 08:50 |
|
Bruegels Fuckbooks posted:The compiler can order the allocation of local variables however it wants - it isn't forced to allocate at the point where the variable is declared, or even at all (a trivial example being not allocating an unused local) - it only needs to allocate a local the first time it is used. In reality, outside of some special cases, compilers generally allocate all of the local variables at once: during entry to the function (the "prologue"), a stack frame is created that's large enough to fit everything they need to put there. Shrink-wrapping the stack to only fit the storage actually in use introduces a lot of extra bookkeeping: in the compiler's internals, in stack-layout metadata (e.g. for debuggers), and dynamically in stack-pointer adjustments. It's not generally considered worthwhile. Of course, if a stack allocation is actually dynamically-sized, as a C99 variable length array is, then you don't really have a choice about it — you can't just do a single dynamically-sized stack adjustment in the prologue because, in general, you can't compute the required bounds until you get to the right part of the function. All that said, it is a common optimization to re-use parts of the stack frame depending on what's actually simultaneously in scope. A very simple version of this would be to lay out the stack frame as if you really were dynamically pushing and popping variables individually; that pass would assign offsets to all the variables, and you'd just make a stack frame of whatever the high-water mark was. A more sophisticated version would be tied into the register allocation and use actual value liveness information. LeftistMuslimObama posted:Where's a good place to read up on bit-based arithmetic? Specifically I don't get the whole two's-complement thing. I saw some mentions that MIPS includes an addi op but not a subi op because you can turn addi into a subi using a two's complement operation. Right. subi $t, $s, 4 is obviously the same as addi $t, $s, -4, so what you're really asking is how to figure out the bit pattern for -4 from the bit pattern for 4. Unsigned addition in binary works the same way that unsigned addition in decimal does: you add the digits from right to left, possibly carrying a 1 from the previous step each time. Two's complement is just an encoding of negative numbers that lets you implement signed addition (in fact, most kinds of signed arithmetic) using exactly the same algorithm as unsigned, so that the hardware doesn't have to care which one it's doing. Machines with one's complement or sign-magnitude representations actually had to have separate add-signed and add-unsigned instructions if they wanted to implement both efficiently. Mathematically, -x is the value that, when added to x, yields 0. In other words, if x is 01101101, then -x is the bit-pattern that, when added to x, yields 00000000. (We'll assume we're doing 8-bit math here.) If you flip all the bits in x (i.e. if you compute ~x), you get a bit-pattern that, when added to x, ends up with all the bits set: in every digit, you end up adding 0+1 or 1+0 without a carry. So that x + ~x == 11111111, regardless of what x was. If you add 1 to that, then mathematically you would get 100000000; but we're doing 8-bit math, so that top bit disappears as overflow, and you get 00000000. So (x + ~x) + 1 == 0, regardless of what x was. Because addition is associative, that means that x + (~x + 1) == 0. So -x is always ~x + 1.
|
# ? Dec 7, 2014 10:06 |
|
After taking six hours to track down what was ultimately a slight issue with calculating the size of the locals space for function entry, I just want to say that stepping through assembly is crazy frustrating. It's hard to get used to setting breakpoints in 15000 instruction programs and tracking problematic values by watching stack addresses and registers instead of just throwing print statements everywhere.
|
# ? Dec 9, 2014 04:18 |
|
|
# ? Apr 19, 2024 19:43 |
|
The most fun, & consequently the most educational stuff I ever did with assembly was optimisation, and I was introduced to the secret god of Pentium optimisation Agner Fog. I am very happy to see that he is still going strong some 15 years after I first read his stuff in anger, which gives you an idea of how old & venerable he is now. Everything on his site is worth reading.
|
# ? Dec 9, 2014 21:56 |