Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
StumblyWumbly
Sep 12, 2007

Batmanticore!
Thanks for the answers, this all makes a ton of sense.

Foxfire_ posted:

(I don't know what you mean by "unstable memory environment")

It's not a huge thing for me, but I've heard that C++ is not appreciated in high radiation environments ranging including space and some medical/aerospace. I've heard a few rationales, one being that a memory error that changes the location of a function pointer is the worst type of disaster, and C++ uses function pointers in hidden ways, like virtual functions. I had assumed that templates would also do some kind of dynamic typing, because that's how the code makes it look, but generating extra functions makes more sense.

Adbot
ADBOT LOVES YOU

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

StumblyWumbly posted:

Thanks for the answers, this all makes a ton of sense.

It's not a huge thing for me, but I've heard that C++ is not appreciated in high radiation environments ranging including space and some medical/aerospace. I've heard a few rationales, one being that a memory error that changes the location of a function pointer is the worst type of disaster, and C++ uses function pointers in hidden ways, like virtual functions. I had assumed that templates would also do some kind of dynamic typing, because that's how the code makes it look, but generating extra functions makes more sense.
I would think in those environments you'd use ECC memory, in which case things are just gonna do a "your memory hosed up" crash and reboot signal no matter what was changed.

Also I would think changing the location of a function pointer would most often just crash anyway which would be a pretty small disaster compared to, e.g. a bitflip changing the target temperature from 50 degrees to 50+2^12 degrees or something like that. Data corruption that you can't possibly know for sure isn't the real data is almost always gonna be worse than behavior corruption.

Or: those things sound like post-hoc rationales from someone who didn't want to learn C++.

ExcessBLarg!
Sep 1, 2001
Although C++ is a very sophisticated language it scales down fairly well too as long as you know what features to avoid. In the worst case it scales down to C.

Personally I think if you're going to go to the effort to not link against libstdc++ you've probably dumped enough of the language that you might as well just write straight C, but that bias may ultimately be due to my greater professional familiarity with C.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


What about custom allocators in the embedded world?

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Another c++ on baremetal rtos person here. It’s really nice. There’s even more and more of the stl you can actually use in modern standards. For instance, std::array doesn’t allocate and lets you get range and span. std::variant is great for message passing architectures. from_chars and to_chars provide non-allocating parse/emit for the kinds of things you actually parse/emit on micros, right in the standard. There’s a ton of stuff you get for free.

Even with stl containers that normally allocate, the thing you need to avoid in memory constrained real-time environments is a shared heap with possibly-unbounded allocations - the shared part is where you run into trouble with non deterministic allocation times busting real time needs because you might need a heap scan to compact or find appropriate holes, and the unbounded part is obviously bad. But because those are the specific problems, if you’re comfortable writing and using custom allocators, you can use object containers like deque, list, or map with a per-container object pool allocator of bounded size backed by a static buffer just fine.

There’s some problematic stuff still - formally it’s a really bad idea to turn off exceptions since they provide the only language safe signaling mechanism in places like constructors, and you probably aren’t actually doing as good a job of making everything noexcept as you think, I know I’m not - but it’s a really nice environment particularly if you have the sort of brain worms that like to write c++ that asymptotically approaches rust.

StumblyWumbly
Sep 12, 2007

Batmanticore!
Rust is a great language, and I think getting into it made me a better programmer. But I had to drop it for C++ when I realized it was really not going to be used in a job. Folks I talk to agree that it would be great for someone else to move their giant code base to Rust, but nobody wants to be that someone else

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


I think C is fine for embedded but C++ can be really nice. It depends - if you're writing tight, inline-assembly loops on constrained micros without rtos or writing linux kernel drivers then you don't really have a choice, but if neither of those things is on the cards C++ is much nicer to work with.

I think the industry as a whole is slowly adopting C++ more and more, especially as chips become more powerful and/or cheaper with time.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
I’m not sure you can call virtual functions a hidden use of function pointers, really.

ExcessBLarg!
Sep 1, 2001
What's the difference between a SEU that changes a vtable entry, versus one that changes a PLT entry, versus one that changes a jsr instruction for code paged into RAM?

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

ExcessBLarg! posted:

What's the difference between a SEU that changes a vtable entry, versus one that changes a PLT entry, versus one that changes a jsr instruction for code paged into RAM?

To be fair, most of the micros we're talking about here are XIP nonvolatile code storage, so that last one isn't happening. The first two are fair though.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
It’ll depend a lot on the actual device architecture. ROM should be much more resilient against random memory failures than RAM. I don’t know if people worrying about this stuff also have to worry about the CPU’s internal state being corrupted the same as RAM, except that presumably they can protect the PC from corruption or else none of the rest of this makes much of a difference. Anyway, if the CPU can directly address ROM, e.g. to run code straight from it without copying it into RAM first, then that dramatically shrinks the threat profile of a random bit-flip.

Now, you could have a compiler put a v-table into ROM; there’s nothing about v-tables that requires them to be writable or to share the same address space as RAM (if that’s a concern). But polymorphic objects in RAM will still have pointers to the v-table that can be corrupted, potentially causing wild calls, and of course there’s data that has to be loaded into a register, where it may or may not be corruptible. So if you’re trying to guarantee limited misbehavior under RAM corruption, you would definitely want to avoid virtual functions, just like you’d avoid any other indirect call.

I do think anyone looking at virtual functions ought to know that they typically involve an indirect call. There are a million ABI and optimization details you can ignore, but that much anyone should recognize.

qsvui
Aug 23, 2003
some crazy thing

StumblyWumbly posted:

It's not a huge thing for me, but I've heard that C++ is not appreciated in high radiation environments ranging including space and some medical/aerospace. I've heard a few rationales, one being that a memory error that changes the location of a function pointer is the worst type of disaster, and C++ uses function pointers in hidden ways, like virtual functions.

I've worked on devices sent into space that were all running on C++. Is there any data backing up this rationale? I'd really like to read it because right now, this just sounds like hand waving.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
I mean it very well may be? Like anybody here who's worked in embedded for a while, how many people have you met that still use PICs or only use assembler or don't believe in undefined behavior or don't believe in unit tests. Handwaving is endemic

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
And it's sort of hard to blame people that much for it. It's an environment where every random 3 to 6 month project usually comes with a different programming environment, completely different set of requirements and goals and compute available, different sets of hundred/thousand page pdfs to understand, different levels of abstraction, testability, amount of feedback, math and science requirements, support and documentation, and a lot of it really not actually aimed at being helpful; and in a lot of those environments once code's released it's never touched again. At a certain level it's hard to blame people for falling back on stuff they know, and know works, and don't see a point in changing.

Foxfire_
Nov 8, 2010

Radiation hardening is not something that can be done with choice-of-software. For example, general purpose integrated circuits have unintended parasitic transistors on them (any sandwich of differently doped semiconductor forms a transistor) that if ever turned on by an energetic particle will short between power and ground until the chip melts or power is removed.
For radiation hard ones, you build the wafers differently with extra non-semiconductor layers isolating things

more falafel please
Feb 26, 2005

forums poster

rjmccall posted:

I do think anyone looking at virtual functions ought to know that they typically involve an indirect call. There are a million ABI and optimization details you can ignore, but that much anyone should recognize.

Have you performed any interviews lately? At least 50% of non-senior candidates for a C++ job (in gamedev, so that might be a caveat) have never thought about how virtual functions work.

To be fair, a good portion of them invent vtables or something vaguely like them on the fly if you ask them how they might implement something like that, but yeah, tons of people don't know that at all.

frogge
Apr 7, 2006


Decades ago I took a class on C++ and mostly goofed around with it, making choose your own adventure style text programs with some random numbers generated to give variety to outcomes. I have totally forgotten how to code and have no idea where to start to learn how to do that sort of thing again. I'm not looking to go pro, just dabble as a hobby and at best create a program to automate some rng tables for D&D as a novelty that I can add homebrew stuff in here and there as I come up with it and maybe have it save that stuff between sessions since my note-taking is atrocious.

Should I bother with C++ or try learning something newer since I've basically forgotten it all? How would I get started with it?

more falafel please
Feb 26, 2005

forums poster

frogge posted:

Decades ago I took a class on C++ and mostly goofed around with it, making choose your own adventure style text programs with some random numbers generated to give variety to outcomes. I have totally forgotten how to code and have no idea where to start to learn how to do that sort of thing again. I'm not looking to go pro, just dabble as a hobby and at best create a program to automate some rng tables for D&D as a novelty that I can add homebrew stuff in here and there as I come up with it and maybe have it save that stuff between sessions since my note-taking is atrocious.

Should I bother with C++ or try learning something newer since I've basically forgotten it all? How would I get started with it?

Pick up Python if you want to make a computer do stuff, imo. Pick up C++ if you're more interested in how a computer does stuff.

Or, hell, pick up JavaScript and a frontend framework, which would probably make it easier to make a nice little web interface for your thing.

cheetah7071
Oct 20, 2010

honk honk
College Slice
I would not learn C++ unless you have a specific reason to prefer C++. If you don't know that you have that reason, then you almost certainly don't

Python is also my recommendation for any sort of simple toy program. It's one of the easiest languages to learn and the amount of time it takes to code up a program is pretty low so you can have something working much sooner.

more falafel please
Feb 26, 2005

forums poster

Other than generally friendly syntax, etc, the reason python is great is that there's just Already A Thing For That. Need to parse some json? Just pull in the json parsing thing that literally millions of people use every day. Need to scrape live MLB data? Just write some http queries and then parse the JSO... oh wait actually just pull in the MLB module that Just Does It For You.

I've never written a Real Adult Program in python but it's absolutely what I turn to when I want A Thing to do A Thing.

frogge
Apr 7, 2006


Awesome, thanks for the feedback. I'll find one of those "python for complete dumbasses" books and get to it.

more falafel please
Feb 26, 2005

forums poster

frogge posted:

Awesome, thanks for the feedback. I'll find one of those "python for complete dumbasses" books and get to it.

I've heard pretty good things about https://openbookproject.net/thinkcs/python/english3e/ and you already have a copy of it, so that's a bonus.

StumblyWumbly
Sep 12, 2007

Batmanticore!

qsvui posted:

I've worked on devices sent into space that were all running on C++. Is there any data backing up this rationale? I'd really like to read it because right now, this just sounds like hand waving.

I'm looking for real data on it too, and not finding anything. I think it's just ancient tradition.

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


frogge posted:

Should I bother with C++ or try learning something newer since I've basically forgotten it all? How would I get started with it?

I agree that a simple language might be a better starting point, but it's worth noting that C++ hasn't really been superseded yet, certainly not by Python or JS.

Well there's Rust but that's not simpler to learn by any means.

more falafel please
Feb 26, 2005

forums poster

Private Speech posted:

I agree that a simple language might be a better starting point, but it's worth noting that C++ hasn't really been superseded yet, certainly not by Python or JS.

Well there's Rust but that's not simpler to learn by any means.

I'm a C++ apologist, but if OP is saying they've forgotten all the C++ they knew and want to make a fun thing to automate some logic, python or JS are definitely the way to go.

C++ is still the best language for doing the things you'd want to do in C++, and Rust is the only challenger worth mentioning.

ExcessBLarg!
Sep 1, 2001

Private Speech posted:

I agree that a simple language might be a better starting point, but it's worth noting that C++ hasn't really been superseded yet, certainly not by Python or JS.
I'd say that C++ has been superseded by both Java and C# for many of the kinds of applications that were originally written in C++.

I mean, sure, managed runtimes and garbage collection aren't appropriate for all applications and there's certainly specialized domains for which nothing has replaced C++, but that makes it a bit more of a specialist language than probably was intended in 1998.

cheetah7071
Oct 20, 2010

honk honk
College Slice
I have a bit of a weird question: what's the best way to architect work that is conceptually tied to an object's end-of-life, without actually putting the work into a destructor (which seems like a bad idea)?

If the details matter:

I have an object whose purpose is to wait for data from a known number of files which probably aren't coming sequentially--it knows what files the data it cares about lives in, but doesn't (and can't) have any guarantees about how long the wait between its files will be. When all the data is gathered, it needs to calculate a few summary statistics and then tidy up by clear its memory, because there's no more use for the data once the summary statistics have been calculated. I currently have this architected by having functions which "open" and "close" the object, and when the number of closes equals the expected number of files, it does its cleanup, which includes doing the actual work of data processing.

The architecture I'd like to have is something like a wrapper class that, upon opening a file, opens each of the relevant objects (the ones that expect data from that file), and then, at the end of its lifetime (when I'm done reading from the file), closes them. Closing the objects is conceptually an end-of-life tidying-up operation which might normally go into a destructor, but what's giving me pause here is that part of the tidying up is doing actual work. Doing actual work in a destructor goes against the advice of the language specification, against the advice of stack exchange, and leads to unreadable code where it's not even obvious that I'm doing the work at all.

The best I have is something like:

code:
WrapperObject w;
try {
	w = WrapperObject(filename); //this opens all the relevant data objects.
	//read data from the file here, possibly with an IO error, with WrapperObject providing the framework for assigning data to objects
	w.closeAndDoWork(); //manual version of the destructor I described above
} catch (...) { //I wouldn't actually use ... I'm just not sure offhand which exceptions to expect
	w.closeAndDoWork();
}
This also feels kind of messy, because of C++'s lack of finally blocks, and because cleanup operations aren't supposed to be explicit functions--that's the point of RAII

I have no doubt that both the destructor version and the try-catch version would work but they both feel wrong to the point where I'm half-convinced there must be a third option I'm not thinking of.

cheetah7071 fucked around with this message at 02:28 on May 15, 2022

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Do you actually still want to calculate these statistics if an exception is being thrown? If that exception is going to bubble up and abort this processing then it seems like you don't, that calculation would just be a waste of time. Even if you did, you have no guarantees about how much of the underlying data this particular summary encompasses, so it's not actually very useful.

If you reason that you don't actually need the statistics in this case, then it gives a very natural split of putting the statistics calculation in an explicit finish() method, and the memory cleanup in the destructor.

cheetah7071
Oct 20, 2010

honk honk
College Slice
you might be right, actually. The only exception I can think of where the data is still in a usable state would be if the header of the file is ill-formed and claims that more data is in the file than really exists, which would cause me to attempt to read past the end of the file. But I can probably just write that off as both extremely unlikely, and an error that the user would want to know about and fix--that their files suck

in which case it still feels a bit awkward to have an explicit cleanup function because for the vast majority of cases RAII means you never have to explicitly clean anything up. But a lot less awkward than it would in a full try-catch block.

e: I was typing while you were editing. It would require a bit of re-architecting my data object to delay the cleanup until my wrapper object destructs, but that's probably a small price to pay for readable code

ExcessBLarg!
Sep 1, 2001

cheetah7071 posted:

I have a bit of a weird question: what's the best way to architect work that is conceptually tied to an object's end-of-life, without actually putting the work into a destructor (which seems like a bad idea)?
I assume the data processing within each file is a long-running operation but independent for each file. In which case, are you parallelizing the work across multiple threads?

If so, I'd have the wait-for-data object spawn it's own thread that waits on a condition variable for when the per-file data is received, and when it wakes it checks if all the data has been received in a loop. Your bust-out-of-the-loop conditions are (i) all data has been received, and (ii) not all the data is received but there's no workers left. For (i) you compute the summary statistics and move on, for (ii) there was an error and you should abort. Either way, the whole lifecycle of that object is essentially contained to a single supervisor function.

ExcessBLarg! fucked around with this message at 12:51 on May 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice
That might be a good idea but I have potentially hundreds of thousands or millions of wait for data objects and I think that might be a few too many threads lol

e: the purpose of this architecture is that the vast majority of wait-for-data objects have all their data contained in a single file so I can just open them, fill them, process them, and empty them all in a single use. But a significant minority have their data spread over 2+ files, and their memory usage can pile up, especially in a large dataset. So, whenever they're hibernating, if too much memory is being used and I need to cool down for a bit, they'll dump their data to the hard drive and then pull it back off when they're re-opened. It kind of sucks to have more hard drive hits, but at least the data is de-compressed, already sorted, and just the part I'm interested in instead of the entire file, so it's better than solutions which involve touching files multiple times.

cheetah7071 fucked around with this message at 16:09 on May 15, 2022

ExcessBLarg!
Sep 1, 2001
I know you're trying to simplify the architecture description in order to ask your specific questions, but honestly I don't know if we can provide great answers without understanding the problem as a whole.

So you have data files and summary processes that run after each file is complete. How many summary processes per file? Is it variable? For the ones that require data spread across multiple files, is it just two? Are they sequential? There's a lot of things here that aren't clear.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
My vague understanding of the problem (from other threads) is that it's spatially-organized data, where each file on disk contains info about one region in 2d space. The points that need data from multiple files are the ones very near the edge of a region, I assume there's some need to check properties of things just over the border in the other region.

That would imply that there are many things being processed per file, the vast majority can be resolved solely with data from that file, the remainder will mostly require just two files but in some cases it could be more. The files containing the other required data will not necessarily be the next files read.

cheetah7071
Oct 20, 2010

honk honk
College Slice
Sure I guess I'll just dump the whole problem here and see if you have any advice, though I have an algorithm I'm happy with at this point so I wasn't intending to ask about that. But there's always a chance I'm doing something really dumb somewhere

I have a bunch of files containing x/y/z locations. I know from the header of each file what the minimum and maximum x and y values are for points in that file. In the normal case, the data will be sorted so that these bounds do not overlap between files, but there's no guarantee of that, and I've seen some wild layouts in my time. I may end up just telling the user their data sucks if their layout is too wild--there already exist tools to convert it into the non-overlapping tiling. The number of points will reach into the hundreds of billions in a large dataset. About 2TB compressed.

The desired output is a new grid laid over the same area, with a much finer resolution, where the value in each gridcell is stuff like "the mean z value of all points whose x/y fall within this cell". Basic statistics like that. The resolution of these grids is such that they easily fit in memory on an average machine, but might still be up into the tens of millions of values in a large dataset. In the normal case, where the input files are non-overlapping, the majority of gridcells (~90% of them or so) will only need data from one file. Along the edge between files, they'll need two, and the rare gridcells at the intersection will need four.

The current algorithm is to, in a first pass over the headers of the files, determine how many files each gridcell might get points from, and determine an adjacency list of the files (considering two files adjacent if at least one gridcell is looking for points from both of them). Then I traverse the file list in a way which prioritizes reading files whose adjacencies have already been read, parallelizing the task of decompressing the input files and sorting their points into gridcells. Whenever a gridcell has received data from the number of files that it expects from that first pass, it produces its summary statistics and then cleans up its memory.

In a prototype version of this program, where my file order wasn't intelligent, these "stray points" (as I call the points that have been read and exist in memory but are waiting on another file before they can be cleaned up) amounted to >60GB on a large dataset. Traversing the file list intelligently should cut that down by a lot, but I want this program to have a chance of running even on a kind of mediocre machine, so I'm worried about the memory used by stray points.

The current plan is to have each gridcell be a data object that knows: how many files it needs points from, how many threads are currently using it, and how many times its ever been used. If the number of times it's been used equals the number of files it needs points from, and no thread is currently using it, it should clean itself up, because it's completely done. (This was the part that was doing work that's conceptually end-of-life for the object). If it's not currently being used by any thread, but it's still expecting more data in the future, and my memory usage is currently worrying, it should offload its data to the hard drive, and read it back in the next time its needed.

I'm also currently planning to test a pre-processing step where I count how many points each gridcell can expect to get. This would involve two full passes through the files, but will make my allocations a lot better, because I can pre-allocate all the memory for each relevant gridcell whenever I open a file, instead of managing it with push_back. It's not clear to me whether better allocations will be worth the second pass through the files, and I'll just have to test it.

nelson
Apr 12, 2009
College Slice
For the person asking what this is for: Sounds like some kind of map processing software. It gets pretty complex quickly when you realize 🌎 is 3D but 🗺 are 2D.

For the person trying to implement it: Is there any way to pre-process the files and add useful meta-data (one time cost) so your program can be more efficient when using the files (recurring benefit)??

nelson fucked around with this message at 17:26 on May 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice

nelson posted:

Sounds like some kind of map processing software. It gets pretty complex quickly when you realize 🌎 is 3D but 🗺 are 2D.

Fortunately all of that is handled by dependencies for me lol, if I had to handle map projections myself I'd probably just give up

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Hmm, is it possible to partially compute the summary statistics from one file, and then only keep that partial result instead of the full set of points around?

e.g. if instead of keeping all the points you just kept "here's the number of points in that grid cell so far, here's the current average" you can still compute the final average once the rest of the data comes in, without so much of the memory usage.

cheetah7071
Oct 20, 2010

honk honk
College Slice

Jabor posted:

Hmm, is it possible to partially compute the summary statistics from one file, and then only keep that partial result instead of the full set of points around?

e.g. if instead of keeping all the points you just kept "here's the number of points in that grid cell so far, here's the current average" you can still compute the final average once the rest of the data comes in, without so much of the memory usage.

I need quantiles as one of the statistics, and I don't think there's any way to combine quantiles on partial data to get the quantile of the full data

I do use that strategy in another section of the program--I need to produce a grid with a much much finer resolution, which will never fit in memory for a whole dataset, but where the summary statistic is the maximum. In that case it's pretty easy to produce one output file per input file and then combine them at the end.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
If approximate (but generally pretty drat close, and you can tune how much error is acceptable) quantiles are sufficient, there are definitely ways to do that. I'm partial to KLL sketches because that's what's implemented in the systems I use, but I'm sure there are other options too.

Adbot
ADBOT LOVES YOU

cheetah7071
Oct 20, 2010

honk honk
College Slice
I didn't even think about that because I just assumed it was impossible. I have extremely loose error bounds because my data is collected by instruments with ~10-20cm precision. Trying to get precision higher than the input data is pointless to begin with. I think those were my only statistics that didn't have exact formulas for deriving them from partitioned data--mean is easy and a quick google shows that standard deviation has a more complicated but still not-that-bad formula. I'll need to double check that it's okay to plan on never ever including summary statistics that can't be partitioned but that might be the strat.

e: this approach also has the advantage that its trivial to restart a run that failed because windows decided to update or something

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply