Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Tosk
Feb 22, 2013

I am sorry. I have no vices for you to exploit.

Thanks, I should have posted the full code instead of a snippet but I was accounting for that. It ended up being an issue with another part of my code.

Much appreciated, hopefully my questions will improve as I do :P

Adbot
ADBOT LOVES YOU

BattleMaster
Aug 14, 2000

Apologies if this is more of a Linux question. I've heard that using fcntl() to set stdin to non-blocking can affect other processes in weird ways. Is that true? I'd test it myself but I don't know what I'd be looking for in terms of weird behavior.

If so, if you dup() stdin and set the new file descriptor to non-blocking, does that affect stdin in the same way? The reason I ask is that while the fds are said to be interchangeable, they don't share flags and I don't know if that covers nonblocking.

edit: it seems like the recommended way is to not even try and just use select, poll, epoll, etc. but for something extremely simple I was wondering if I could get away with just a nonblocking fd.

BattleMaster fucked around with this message at 22:46 on Apr 25, 2023

b0lt
Apr 29, 2005

BattleMaster posted:

Apologies if this is more of a Linux question. I've heard that using fcntl() to set stdin to non-blocking can affect other processes in weird ways. Is that true? I'd test it myself but I don't know what I'd be looking for in terms of weird behavior.

If you make stdin nonblocking and a child process inherits it from you, they're going to be very confused if they're expecting to block when they read from it.

quote:

If so, if you dup() stdin and set the new file descriptor to non-blocking, does that affect stdin in the same way? The reason I ask is that while the fds are said to be interchangeable, they don't share flags and I don't know if that covers nonblocking.

File descriptors don't share fd flags (F_GETFD/F_SETFD), but the underlying file they point to share file flags (F_GETFL, F_SETFL). The only fd flag that exists on linux is FD_CLOEXEC, everything else (O_NONBLOCK, O_APPEND, etc.) is a file flag.

quote:

edit: it seems like the recommended way is to not even try and just use select, poll, epoll, etc. but for something extremely simple I was wondering if I could get away with just a nonblocking fd.

If nothing's inheriting stdin from you, do whatever you want. (But how exactly are you expecting to use a nonblocking fd without using select/poll/epoll? Just eating up an entire core and spinning when you don't have any input available to read?)

Computer viking
May 30, 2011
Now with less breakage.

Sleep between retries :shrug:

BattleMaster
Aug 14, 2000

I was thinking about something where I need to check stdin periodically before immediately going back to other stuff, but in that case select() with a NULL timeout is probably superior anyway.

Wipfmetz
Oct 12, 2007

Sitzen ein oder mehrere Wipfe in einer Lore, so kann man sie ueber den Rand der Lore hinausschauen sehen.
Planning to work with some std::atomic<> here, especially after C++20 introduced std::atomic<>::wait() and std::atomic<>::notify().

Has anybody worked with that and has some idea or experience w.r.t thread cancelation? It's noexcept, so it's not like it will throw upon the destruction of its std::atomic...

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Thread cancellation is an extremely bad idea that mostly doesn't work.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
And condition variables are a great example of a situation where thread cancellation is just kindof unfixable.

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Wipfmetz posted:

Planning to work with some std::atomic<> here, especially after C++20 introduced std::atomic<>::wait() and std::atomic<>::notify().

Has anybody worked with that and has some idea or experience w.r.t thread cancelation? It's noexcept, so it's not like it will throw upon the destruction of its std::atomic...

c++20 introduced "cooperative thread cancellation" which is supposed to somehow work better than that crazy pthread thread cancelling bullshit but i haven't actually tested it yet because i know i'll spend a week playing with it and will probably decide i hate it in the end. it might work and be good though. thread cancellation in general though is a bad idea, i agree.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
The stop_token stuff is basically just some helpers for manually implementing task cancellation when you need something more complicated than a bool you check occasionally. I haven't had a chance to actually use it yet but it looks sane.

Wipfmetz
Oct 12, 2007

Sitzen ein oder mehrere Wipfe in einer Lore, so kann man sie ueber den Rand der Lore hinausschauen sehen.
So i guess i'll better use a common condition variable to handover work items to worker threads and use a good oldfashioned bool to indicate "no more work incoming, just return from your thread's mainloop, thank you".

Beef
Jul 26, 2004
The simpler, the better. Prefer coordinating with running threads through shared memory datastructures.
A shared stop variable for telling workers to exit works well, it's monotonic. Just make sure with an atomic counter that all threads finished before resetting it. A barrier, basically.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Wipfmetz posted:

So i guess i'll better use a common condition variable to handover work items to worker threads and use a good oldfashioned bool to indicate "no more work incoming, just return from your thread's mainloop, thank you".
If you're using a synced queue to deliver work items and you don't need to interrupt mid-work-item then having an "end" signifier in the queue object, or an "end" work item, makes more sense than having a distinct bool.
(And you shouldn't need any extra objects to wait for the threads to all finish, because there's join for that, assuming when you reach that point you're waiting for all the worker threads or specific worker threads to complete.)

LLSix
Jan 20, 2010

The real power behind countless overlords

Anyone have a profiler for C that runs on Windows that they like? The software we write at work has been running slower and slower all year. I'd like to find out where the problem code is and go streamline it.

My backup plan is to setup a linux vm and use gprof. I'd really prefer to find a windows solution, though. If I have to use a linux vm I'll be the only one able to run it.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

LLSix posted:

Anyone have a profiler for C that runs on Windows that they like? The software we write at work has been running slower and slower all year. I'd like to find out where the problem code is and go streamline it

It’s been a few years, but last I did this on Windows it was with Intel VTune and it worked OK.

I’ve also heard good things about https://github.com/VerySleepy/verysleepy but haven’t used it myself.

OddObserver
Apr 3, 2009
If you do set up linux, don't use gprof, it's pretty much useless for stuff that's not completely blatant. linux-perf is a far more sophisticated tool. Also I guess oprofile, though I think that's semi-deprecated in favor of linux-perf?

I haven't used it myself[1], but I think MS provides ETF for Windows for free, which is supposedly super-sophisticated, though a lot of its functionality is whole-system, there are some instructions on https://randomascii.wordpress.com/2015/09/01/xperf-basics-recording-a-trace-the-ultimate-easy-way/ (blog of Chrome person who mostly seems to use it to find Windows bugs).

[1]... though I may be suppressing an unsuccessful attempt.

Beef
Jul 26, 2004

Subjunctive posted:

It’s been a few years, but last I did this on Windows it was with Intel VTune and it worked OK.

VTune improved a ton in the last few years and is now completely free as part of the oneAPI basekit.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Beef posted:

VTune improved a ton in the last few years and is now completely free as part of the oneAPI basekit.

Yeah I was just noticing that I didn’t really recognize any of the screenshots on the site. Good that they’re still investing in it, hope they aren’t doing anything shady with AMD processors like that library they had that checked CPUID and chose to suck for non-Intel.

BattleMaster
Aug 14, 2000

I found something that is making me tear my hair out using gcc and glibc 2.31 - if that even matters, maybe I'm doing it wrong.

For a program that deals with TCP/IP sockets and deals with a lot of socket file descriptors, I am using search.h's tree functions to map file descriptors to associated data.

So I malloc the structures and put them on the tree (non-essentials and error checking, which I do, omitted):

code:
struct socket_event* event = malloc(sizeof(struct socket_event));

event->fd = fd;

struct socket_event** entry = tsearch(event, &loop->events, compare_event_fd);
Then later when I want to remove it from the tree, I look for an entry with a matching FD then remove it from the tree:

code:
struct socket_event event =
{
	.fd = fd
};
	
struct socket_event** found = tfind(&event, &loop->events, compare_event_fd);

void* ret = tdelete(*found, &loop->events, compare_event_fd);

//free(*found);
Works great, works fantastic, works exactly as described. With tons of print stations including in the search matching function it works perfectly.

However, the problem is that I malloced the event and I should free it at some point otherwise this thing would leak memory for every client. If I uncomment the free statement the whole thing goes haywire. The tree gets jacked up and things in it are no longer reliably found, and sometimes the whole thing just blows up entirely with free complaining about me double-freeing something.

Here's it running into a double free:

code:
Removing fd 5
comparing 5 and 11
comparing 5 and 7
comparing 5 and 5
Found fd 5
comparing 5 and 11
comparing 5 and 7
comparing 5 and 5
free(): double free detected in tcache 2
And here's it running into weird poo poo in the tree that shouldn't be there that interferes with the search:

code:
Removing fd 6
comparing 6 and -1307026832
comparing 6 and 11
comparing 6 and 9
Not found
So far I think I've ruled out the logic of the rest of my program. The tree root pointer starts off NULL and never becomes empty because the first thing I always put on it is the main server socket. It doesn't try to add or remove the same fd from the list twice in a row or anything like that either. I've used the same logic with simpler storage schemes like simple arrays just fine. I thought maybe it's because these functions use the least-significant bit of the pointer as a flag but it functions identically even if I mask off that bit in the pointers before freeing them.

I just don't understand why freeing stuff I malloced myself goes back and mangles the tree!

OddObserver
Apr 3, 2009
Try with valgrind (or asan/msan)?

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
tfind returns a pointer to the internal tree node (the first element of which is the pointer that you passed in). After you call tdelete, that internal node is no longer valid, so you can't then go accessing its first element and expect to get your original data pointer out.

LLSix
Jan 20, 2010

The real power behind countless overlords

What's tdelete actually do? If I was getting those kinds of errors with a linked list I'd think the delete function was missing a found->prev->next = found->next. i.e. it sounds kind of like however the items are being stored still has a pointer to the memory being "deleted" and is accessing it after the free call.


That you're sometimes having it find something already deleted like in your double free example is what is making me the most suspicious delete isn't working right.

Subjunctive posted:

Removing fd 5
comparing 5 and 11
comparing 5 and 7
comparing 5 and 5
Found fd 5
comparing 5 and 11
comparing 5 and 7
comparing 5 and 5
free(): double free detected in tcache 2

BattleMaster
Aug 14, 2000

Jabor posted:

tfind returns a pointer to the internal tree node (the first element of which is the pointer that you passed in). After you call tdelete, that internal node is no longer valid, so you can't then go accessing its first element and expect to get your original data pointer out.

gently caress, this was it. Thank you so much! (and thanks to everyone else, I even had valgrind installed but didn't think to use it before, and it was helping me edge slowly closer to this)

I need to dereference the pointer to my pointer before it deletes it. This works:

code:
struct socket_event event =
{
	.fd = fd
};
	
struct socket_event** found = tfind(&event, &loop->events, compare_event_fd);

struct socket_event* ptr = *found;

void* ret = tdelete(*found, &loop->events, compare_event_fd);

free(ptr);
What's shameful is I totally understood how it was supposed to work but I guess I somehow never internalized that the thing it gave me might no longer exist after tdelete even though the thing it pointed to still exists.

LLSix posted:

That you're sometimes having it find something already deleted like in your double free example is what is making me the most suspicious delete isn't working right.

This quoted part is actually working properly even if the output looks weird - it reports the FDs before it checks them:

code:
static int compare_event_fd(const void* pa, const void* pb)
{
	printf("comparing %i and %i\n", ((struct socket_event*)pa)->fd, ((struct socket_event*)pb)->fd);
	
	if (((struct socket_event*)pa)->fd < ((struct socket_event*)pb)->fd)
	{
		return -1;
	}
	else if (((struct socket_event*)pa)->fd > ((struct socket_event*)pb)->fd)
	{
		return 1;
	}
	else
	{
		return 0;
	}
}

BattleMaster fucked around with this message at 03:52 on May 28, 2023

Presto
Nov 22, 2002

Keep calm and Harry on.
You are closing all these fds too at some point, right?

BattleMaster
Aug 14, 2000

Presto posted:

You are closing all these fds too at some point, right?

Oh yeah, that was the easy part. I wrote an echo server that has backends that use select, poll, epoll, and io_uring (with and without liburing) and loving tsearch ended up being the hardest thing for me to figure out apparently.

Here's a strace of one connection using the epoll backend:

code:
epoll_wait(4, [{EPOLLIN, {u32=721557840, u64=94395512789328}}], 32, -1) = 1
accept4(3, {sa_family=AF_INET, sin_port=htons(48824), sin_addr=inet_addr("192.168.1.2")}, [16], SOCK_NONBLOCK) = 5
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=721557904, u64=94395512789392}}) = 0
accept4(3, 0x7ffc17388400, [16], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(4, [{EPOLLIN, {u32=721557904, u64=94395512789392}}], 32, -1) = 1
read(5, "Hello\n", 256)                 = 6
write(5, "Hello\n", 6)                  = 6
read(5, 0x7ffc17388310, 256)            = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(4, [{EPOLLIN, {u32=721557904, u64=94395512789392}}], 32, -1) = 1
read(5, "", 256)                        = 0
epoll_ctl(4, EPOLL_CTL_DEL, 5, NULL)    = 0
close(5)                                = 0
My biggest finding writing this stuff was that it helps performance when accepting tons of connections if you set the server socket to nonblocking and whenever it shows as readable you do accept() in a loop until it returns EAGAIN/EWOULDBLOCK. If you accept only one connection before going back to service client sockets it really bottlenecks.

I'm sure that people who do this for a living know all this but it's interesting to me as an amateur who bangs their head against it until it works

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

BattleMaster posted:

Oh yeah, that was the easy part. I wrote an echo server that has backends that use select, poll, epoll, and io_uring (with and without liburing) and loving tsearch ended up being the hardest thing for me to figure out apparently.

Here's a strace of one connection using the epoll backend:

code:
epoll_wait(4, [{EPOLLIN, {u32=721557840, u64=94395512789328}}], 32, -1) = 1
accept4(3, {sa_family=AF_INET, sin_port=htons(48824), sin_addr=inet_addr("192.168.1.2")}, [16], SOCK_NONBLOCK) = 5
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=721557904, u64=94395512789392}}) = 0
accept4(3, 0x7ffc17388400, [16], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(4, [{EPOLLIN, {u32=721557904, u64=94395512789392}}], 32, -1) = 1
read(5, "Hello\n", 256)                 = 6
write(5, "Hello\n", 6)                  = 6
read(5, 0x7ffc17388310, 256)            = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(4, [{EPOLLIN, {u32=721557904, u64=94395512789392}}], 32, -1) = 1
read(5, "", 256)                        = 0
epoll_ctl(4, EPOLL_CTL_DEL, 5, NULL)    = 0
close(5)                                = 0
My biggest finding writing this stuff was that it helps performance when accepting tons of connections if you set the server socket to nonblocking and whenever it shows as readable you do accept() in a loop until it returns EAGAIN/EWOULDBLOCK. If you accept only one connection before going back to service client sockets it really bottlenecks.

I'm sure that people who do this for a living know all this but it's interesting to me as an amateur who bangs their head against it until it works

If this isn’t just to mess with sockets, you don’t need to relate these things yourself. Epoll allows you to add a epoll_data to your fds that it will return back to you from epoll_wait: https://man7.org/linux/man-pages/man2/epoll_ctl.2.html

Pretty sure io_ruing also supports this, don’t think poll does

Sweeper fucked around with this message at 12:14 on May 28, 2023

BattleMaster
Aug 14, 2000

Yeah in my first epoll version, which is also the multiplexing API I learned first, I used its userdata of it to store information on what the event was and what user id it belonged to. The user id was also the array index of where I stored the sockets and other information for each user.

So I had an enum like

code:
enum event_type
{
	EVENT_SERVER_SOCKET,
	EVENT_FIRST_SOCKET,
	EVENT_LAST_SOCKET = EVENT_FIRST_SOCKET + MAX_CLIENTS - 1,
	NUM_EVENTS
};
and my event loop looks like

code:
for (int i = 0; i < n; i++)
{
	switch (epoll_events[i].data.u64)
	{
	case EVENT_SERVER_SOCKET:
		<accept connections>
		break;

	case EVENT_FIRST_SOCKET ... EVENT_LAST_SOCKET:
		int clientid = epoll_events[i].data.u64 - EVENT_FIRST_SOCKET;
		<and then do stuff with the socket which can be accessed by clients[clientid].socket>
		break;
	}
}
Select was simplistic enough that it was very simple to implement but not as slick. Essentially I had to iterate through all my active users and ask "is this fd readable? is this fd readable? how about this one?"

Essentially:

code:
if (FD_ISSET(sockfd, &fdreadset_active))
{
	<accept connections>
}

for (int i = 0; i <= maxClientIndex; i++)
{
	if (client[i].active)
	{
		if (FD_ISSET(client[i].socket, &fdreadset_active))
		{
			<handle client>
		}
	}
}
Poll was the one that was a little annoying because it would tell me what fds were readable but have no easy way to relate them with the specific users. This is where I could have used tsearch with the fds as the key, but I wanted to keep the data structures and code as similar as possible between the different multiplexers so they'd be more comparable in a benchmark.

In the end, I made a poll list big enough for the listening socket and the max number of users. The first entry would always be the listening socket. Whenever users disconnected I repacked it using memmove to make sure it never had any holes (not necessary except for performance, maybe - the fds could just be set to negative numbers to get poll to skip them).

I would also keep a mapping between poll_list index and what user it belonged to:

code:
#define POLL_LIST_SIZE 1 + MAX_CLIENTS

struct pollfd poll_list[POLL_LIST_SIZE];
int mapping[POLL_LIST_SIZE];

poll_list[0].fd = sockfd;
poll_list[0].events = POLLIN;

...
	
if (poll_list[0].revents & POLLIN)
{
	<accept connections>
}

for (int i = 1; i <= maxPollIndex; i++)
{
	if (poll_list[i].revents & POLLIN)
	{
		int id = mapping[i];
		<handle client>
	}
}
In the end, while poll's api was clearly an upgrade over select's, poll was a little more annoying to work with, at least with the goal of adapting my existing program to use it. If I wrote something using poll from scratch I'd definitely go with tsearch or another way of mapping the fd directly to the other data associated with that fd. And just use the fd to identify that user.

Now the current reason I'm using tsearch along with epoll though, was that I'm writing a more generic event loop that lets you add and remove fds and have them be polled without having to hardcode them in, as with that enum in my previous epoll setup.

So I have functions that let you add and remove fds, so it needs a way to map those fds to the rest of its internal data, which I'm using tsearch for. The tree is where it keeps all the data associated with the fd, instead of a fixed-size structure like before.

The actual event loop uses the userdata of epoll to contain a pointer directly to the data structure for that fd, which contains the callback function and other information.

So my event handling looks like

code:
for (int i = 0; i < n; i++)
{
	struct event* event = epoll_events[i].data.ptr;
			
	switch (event->type)
	{
	case EVENT_NONE:
		// for fds which are awaiting deletion from the poll list but may still be in the queue			
		break;
				
	case EVENT_FD:
		event->callback(event->fd, event->userdata);
		break;
	}
}
I rather like epoll compared to the other options, even the more experimental stuff I've tried with io_uring. io_uring is great for queuing up batches of I/O events but I don't really like it so far for actually just polling FDs. (io_uring has support for interacting with epoll, but it's now deprecated, no joke, because in a listserv message between io_uring's creator and Linus Torvalds, Linus said offhandedly that he hated epoll. So my fantasy of coming up with some way to mix epoll and io_uring in a way that is faster than either was dead before it started.)

Like all the criticism of epoll seems to be from people doing stupid boneheaded bullshit with it like sharing it between processes or closing FDs without removing them from epoll's watch list and expecting epoll to know they're no longer relevant. But in my experimentation it seems pretty slick and the inclusion of the userdata that lets you identify each event however you want (by the FD, by a pointer, or by an arbitrary integer) was a really good choice.

tl;dr I/O multiplexing is a land of contrasts

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

BattleMaster posted:

Yeah in my first epoll version, which is also the multiplexing API I learned first, I used its userdata of it to store information on what the event was and what user id it belonged to. The user id was also the array index of where I stored the sockets and other information for each user.

So I had an enum like

code:
enum event_type
{
	EVENT_SERVER_SOCKET,
	EVENT_FIRST_SOCKET,
	EVENT_LAST_SOCKET = EVENT_FIRST_SOCKET + MAX_CLIENTS - 1,
	NUM_EVENTS
};
and my event loop looks like

code:
for (int i = 0; i < n; i++)
{
	switch (epoll_events[i].data.u64)
	{
	case EVENT_SERVER_SOCKET:
		<accept connections>
		break;

	case EVENT_FIRST_SOCKET ... EVENT_LAST_SOCKET:
		int clientid = epoll_events[i].data.u64 - EVENT_FIRST_SOCKET;
		<and then do stuff with the socket which can be accessed by clients[clientid].socket>
		break;
	}
}
Select was simplistic enough that it was very simple to implement but not as slick. Essentially I had to iterate through all my active users and ask "is this fd readable? is this fd readable? how about this one?"

Essentially:

code:
if (FD_ISSET(sockfd, &fdreadset_active))
{
	<accept connections>
}

for (int i = 0; i <= maxClientIndex; i++)
{
	if (client[i].active)
	{
		if (FD_ISSET(client[i].socket, &fdreadset_active))
		{
			<handle client>
		}
	}
}
Poll was the one that was a little annoying because it would tell me what fds were readable but have no easy way to relate them with the specific users. This is where I could have used tsearch with the fds as the key, but I wanted to keep the data structures and code as similar as possible between the different multiplexers so they'd be more comparable in a benchmark.

In the end, I made a poll list big enough for the listening socket and the max number of users. The first entry would always be the listening socket. Whenever users disconnected I repacked it using memmove to make sure it never had any holes (not necessary except for performance, maybe - the fds could just be set to negative numbers to get poll to skip them).

I would also keep a mapping between poll_list index and what user it belonged to:

code:
#define POLL_LIST_SIZE 1 + MAX_CLIENTS

struct pollfd poll_list[POLL_LIST_SIZE];
int mapping[POLL_LIST_SIZE];

poll_list[0].fd = sockfd;
poll_list[0].events = POLLIN;

...
	
if (poll_list[0].revents & POLLIN)
{
	<accept connections>
}

for (int i = 1; i <= maxPollIndex; i++)
{
	if (poll_list[i].revents & POLLIN)
	{
		int id = mapping[i];
		<handle client>
	}
}
In the end, while poll's api was clearly an upgrade over select's, poll was a little more annoying to work with, at least with the goal of adapting my existing program to use it. If I wrote something using poll from scratch I'd definitely go with tsearch or another way of mapping the fd directly to the other data associated with that fd. And just use the fd to identify that user.

Now the current reason I'm using tsearch along with epoll though, was that I'm writing a more generic event loop that lets you add and remove fds and have them be polled without having to hardcode them in, as with that enum in my previous epoll setup.

So I have functions that let you add and remove fds, so it needs a way to map those fds to the rest of its internal data, which I'm using tsearch for. The tree is where it keeps all the data associated with the fd, instead of a fixed-size structure like before.

The actual event loop uses the userdata of epoll to contain a pointer directly to the data structure for that fd, which contains the callback function and other information.

So my event handling looks like

code:
for (int i = 0; i < n; i++)
{
	struct event* event = epoll_events[i].data.ptr;
			
	switch (event->type)
	{
	case EVENT_NONE:
		// for fds which are awaiting deletion from the poll list but may still be in the queue			
		break;
				
	case EVENT_FD:
		event->callback(event->fd, event->userdata);
		break;
	}
}
I rather like epoll compared to the other options, even the more experimental stuff I've tried with io_uring. io_uring is great for queuing up batches of I/O events but I don't really like it so far for actually just polling FDs. (io_uring has support for interacting with epoll, but it's now deprecated, no joke, because in a listserv message between io_uring's creator and Linus Torvalds, Linus said offhandedly that he hated epoll. So my fantasy of coming up with some way to mix epoll and io_uring in a way that is faster than either was dead before it started.)

Like all the criticism of epoll seems to be from people doing stupid boneheaded bullshit with it like sharing it between processes or closing FDs without removing them from epoll's watch list and expecting epoll to know they're no longer relevant. But in my experimentation it seems pretty slick and the inclusion of the userdata that lets you identify each event however you want (by the FD, by a pointer, or by an arbitrary integer) was a really good choice.

tl;dr I/O multiplexing is a land of contrasts

epoll has some non-trivial edge cases around the handling of level triggered, edge triggered, etc. it’s basically only safe to use from multiple threads in one configuration (ET, one_shot?) or something. I forget the details exactly, but it definitely got warts

Generally I’ve found all of these apis pretty slow so we end up bypassing the kernel anyway and I don’t have to call pill, I just spin on an ef_vi handle :v:

BattleMaster
Aug 14, 2000

ef_vi is definitely a bit more than I think I can handle. I'm okay with janitoring system calls but I don't think I'm ready for doing the lower-layer protocols myself

They added the EPOLLEXCLUSIVE flag for having multiple epoll FDs monitor the same FD, like a listening socket. If you open the socket and then make threads or fork off new processes and then they all make separate epoll FDs that monitor it with the EPOLLEXCLUSIVE flag ored in, "one or more" of them will wake up. If you don't use this flag, more or all of them will likely wake up. However, the worst thing that happens is a thundering herd, you don't run into any weird bugs or edge cases with epoll.

Some people have run into problems making one epoll FD and then sharing it across threads or processes which seems like an outright bad idea. There's so little cost to making an epoll FD that I don't see why you wouldn't just make one per thread or process. Just utter madness. You try sharing one select fd_set or one poll event array between multiple threads and see how they like it. Same with closing FDs but not informing epoll that you no longer want to monitor them. You try that with poll or select and they won't be happy either. epoll is better than those (at least for many fds) but it's not magic.

In my experimentation, the actual best way to do this is to do all your forking first, then have each process open its own listening socket, setting the SO_REUSEADDR and SO_REUSEPORT sockopts set before binding the address. Each process can monitor that socket however they want, using whatever multiplexing scheme or even just doing blocking I/O on it (hundreds of processes with their own socket doing blocking I/O isn't even that bad in my experimentation but doesn't scale as well as any of the multiplexing schemes, even select), but the key thing is that if each process or thread has its own socket, the kernel only sends a given incoming connection to one of them. No thundering herd problem, no weird fighting between epoll instances or anything like that. And the kernel is pretty good about load balancing.

The disadvantage is that if you do this on a port above 1023, theoretically a rogue process could listen on the same port and steal a portion of your connections. But maybe you have worse problems if that's happening.

BattleMaster fucked around with this message at 21:24 on May 28, 2023

Dylan16807
May 12, 2010

BattleMaster posted:

I rather like epoll compared to the other options, even the more experimental stuff I've tried with io_uring. io_uring is great for queuing up batches of I/O events but I don't really like it so far for actually just polling FDs. (io_uring has support for interacting with epoll, but it's now deprecated, no joke, because in a listserv message between io_uring's creator and Linus Torvalds, Linus said offhandedly that he hated epoll. So my fantasy of coming up with some way to mix epoll and io_uring in a way that is faster than either was dead before it started.)

Like all the criticism of epoll seems to be from people doing stupid boneheaded bullshit with it like sharing it between processes or closing FDs without removing them from epoll's watch list and expecting epoll to know they're no longer relevant. But in my experimentation it seems pretty slick and the inclusion of the userdata that lets you identify each event however you want (by the FD, by a pointer, or by an arbitrary integer) was a really good choice.

tl;dr I/O multiplexing is a land of contrasts

https://lore.kernel.org/io-uring/20230501185240.352642-1-info@bnoordhuis.nl/T/#u

If you mean this, I think you're safe.

I don't know what the dislike for it was based on.

BattleMaster
Aug 14, 2000

That's good news. Getting rid of it because of Torvalds' whim was pretty lame. I honestly don't know why he has such a bug up his rear end about it, but the original conversation I referred to is here.

Torvalds busts into a thread about io_uring for no real reason other than to say that "epoll is the worst possible implementation of a horribly bad idea, and one of the things I would really want people to kill off" and that he hopes io_uring helps kill it off. So Axboe says (paraphrased) "well we can get rid of epoll support in io_uring to help kill off epoll." Like maybe Torvalds doesn't like the way it's implemented in the kernel or something, but I don't really see how the system calls could be any different and I don't see how polling FDs for readiness is a bad idea anyway. Maybe he doesn't like having the kernel manage the watchlist?

io_uring is kind of worse at polling FDs than epoll, both in terms of API (io_uring polls are one-shot only so they need to be resubmitted every time they fire, and have fewer options like no ability to do edge triggering, and the poll results get mixed into the completion queue along with every other operation completion so you never get a list of just what FDs are ready) and benchmarks for things that replaced epoll with io_uring are often a little slower.

From my outsider's view the epoll API is pretty solid and it does and works the way I expected it to. It's simpler and easier to effectively use than select and poll, although with the disadvantage that you need to make system calls to add/modify/remove FDs in the polling list.

io_uring's support for epoll seems like it actually has the possibility to work well alongside epoll. The fact that you can queue up a large number of epoll_ctls and submit them with just one syscall (not to mention all the accepts, reads, writes, and closes you will be doing) helps mitigate that disadvantage.

You can actually do a server with io_uring without doing polling at all, by queuing up blocking operations (accept when you don't know there's an incoming connection, read when you don't know there's waiting data, etc.) which io_uring will handle asynchronously, and you can handle them when they eventually show up in the completion queue. But that doesn't seem necessarily very good and it may be faster or less resource intensive to just poll them and queue up the operations when you know they'll be handled quickly instead of blocking in the io_uring shadow realm.

BattleMaster fucked around with this message at 21:17 on May 30, 2023

LLSix
Jan 20, 2010

The real power behind countless overlords

Subjunctive posted:

It’s been a few years, but last I did this on Windows it was with Intel VTune and it worked OK.

I’ve also heard good things about https://github.com/VerySleepy/verysleepy but haven’t used it myself.

Thank you. VerySleepy is exactly what I was looking for. Best of all, it's already on the list of pre-approved software.

Surprisingly, it seems like almost all the slowness we've been seeing is due to I/O, mostly due to a deeply stupid core architecture decision and partly because we issue a couple hundred alarms on startup with our development data. I guess maybe we should fix the developer data so it stops issuing some of those alarms.

BattleMaster
Aug 14, 2000

Screaming into the void again, I'm doing some more stuff with io_uring. It has an opcode (IORING_OP_LINK_TIMEOUT added with io_uring_prep_link_timeout in liburing) has the ability to attach a timer to an entry being submitted that will cancel it if the timer ticks over before it's completed.

It works fine but the return value is useless and also not properly documented. It says it returns 0 on a success but there's no scenario where it returns 0. If no actual error happens, it returns -ETIME if the timeout occurred or -ECANCELED if the timeout was canceled because the attached entry completed first. Also, -ETIME means a cancellation is ATTEMPTED, not that the cancellation succeeded. So the only way to find out if the cancellation went through is to find the entry for the thing you wanted to cancel in the completion queue and check if the return value was -ECANCELED. So the completion queue is crapped up with a useless entry with a useless return value.

Also it would have been really nice if the completion queue entries contained the associated opcode to provide context for what the userdata means, especially if I use a pointer for the userdata that could point to different things depending on what the opcode was. (I could have the userdata point to a structure that has an enum and a void pointer in it but ehhh)

edit: Also would have been cool if they implemented an opcode for recvfrom so I could use it for UDP without having to use recvmsg which has all kinds of features I don't want

BattleMaster fucked around with this message at 02:31 on Jun 2, 2023

Foxfire_
Nov 8, 2010

In code like
C++ code:
template <typename A, typename B>
class Foo
{
public:
    class ThingToFormat { /* Stuff */ };
    // Stuff
};
what's an appropriate incantation for writing a fmt::formatter specialization for printing ThingToFormat?

If Foo wasn't a template, you'd do
C++ code:
template<>
fmt::formatter<Foo::ThingToFormat> { /* Stuff */ };
but that pattern will fail with undeducible template parameters.

I'm bad at SFINAE and can't figure out the right way to write it

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
C++ code:
template <typename A, typename B>
struct fmt::formatter<Foo<A, B>::ThingToFormat> { /* Stuff */ };

Foxfire_
Nov 8, 2010

Foo<A,B>::ThingToFormat is a nondeduced context, the straightforward version of that pattern isn't valid

See https://godbolt.org/z/E6EGd5MYG

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Ah right, because the template parameters are before the ::. I think you just have to unnest the types and it can't be done with nested types? Something like:

C++ code:
#include <fmt/core.h>

template <typename A, typename B>
class ThingToFormatImpl {};

template <typename A, typename B>
class Foo
{
public:
    using ThingToFormat = ThingToFormatImpl<A, B>;
    friend class ThingToFormatImpl<A, B>; // if needed
};

template <typename A, typename B>
struct fmt::formatter<ThingToFormatImpl<A, B>> { /* Stuff */ };

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


BattleMaster posted:

Screaming into the void again, I'm doing some more stuff with io_uring. It has an opcode (IORING_OP_LINK_TIMEOUT added with io_uring_prep_link_timeout in liburing) has the ability to attach a timer to an entry being submitted that will cancel it if the timer ticks over before it's completed.

It works fine but the return value is useless and also not properly documented. It says it returns 0 on a success but there's no scenario where it returns 0. If no actual error happens, it returns -ETIME if the timeout occurred or -ECANCELED if the timeout was canceled because the attached entry completed first. Also, -ETIME means a cancellation is ATTEMPTED, not that the cancellation succeeded. So the only way to find out if the cancellation went through is to find the entry for the thing you wanted to cancel in the completion queue and check if the return value was -ECANCELED. So the completion queue is crapped up with a useless entry with a useless return value.

Also it would have been really nice if the completion queue entries contained the associated opcode to provide context for what the userdata means, especially if I use a pointer for the userdata that could point to different things depending on what the opcode was. (I could have the userdata point to a structure that has an enum and a void pointer in it but ehhh)

edit: Also would have been cool if they implemented an opcode for recvfrom so I could use it for UDP without having to use recvmsg which has all kinds of features I don't want

On the one hand I could joke about Meta and unfinished/unspecified behaviour, but on the other that looks really nice and I would have been extremely happy to have it some years back when I was last doing kernel stuff.

BattleMaster
Aug 14, 2000

Private Speech posted:

On the one hand I could joke about Meta and unfinished/unspecified behaviour, but on the other that looks really nice and I would have been extremely happy to have it some years back when I was last doing kernel stuff.

I have gripes with io_uring but it's really pretty awesome. It's an actually useful way to do asynchronous I/O and not only that but you can submit hundreds of I/O calls all at once with one syscall or even set it up to read the queues automatically and do a whole server with no system calls.

Adbot
ADBOT LOVES YOU

commando in tophat
Sep 5, 2019
I have this problem that when I run my program on different computers (x64), some results won't be same for same inputs when using "sin" (and probably other trigonometric functions and maybe more). By not same I mean I need them to be identical, not just close enough. Is there something easy that can be done about this? I've noticed that in visual studio in code generation, I have runtime library set to "Multi-threaded DLL (/MD)". This apparently means that it will use whatever is available on user's computer and it can give slightly different results?

1. will this help?
2. if this helps and I one of the libraries I use is closed source, does that mean I'm hosed if they only have their library built with /MD ?

I wanted to ask before I spend my day rebuilding all the stuff. I have some recollection that I've changed between /MD and /MT before, but no idea why was that.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply