Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
MrMoo
Sep 14, 2000

froobly posted:

For the record, I use NULL, and "if (!ptr)". "if (ptr == NULL)" seems verbose to me, and also invites the possibility of the disastrous "if (ptr = NULL)" typo. Even so, the real probability of that kind of thing happening is pretty small.

To prevent the typos swap around the comparison for constants, the following will therefore create a compiler error:
code:
if (NULL = ptr)
I seem to recall a compiler option that gives warnings or errors about explicit comparisons to NULL I was surprised anybody cared that much.

Adbot
ADBOT LOVES YOU

MrMoo
Sep 14, 2000

Does anyone have a reference for struct bitfield alignment on Win32? With the standard BSD IP header structure the variables ip_tos and ip_len must be set as bitfields for #pragma pack(1) alignment to work.

code:
struct pgm_ip
{
#if G_BYTE_ORDER == G_LITTLE_ENDIAN
	unsigned int	ip_hl:4;		/* header length */
	unsigned int	ip_v:4;			/* version */
#elif G_BYTE_ORDER == G_BIG_ENDIAN
	unsigned int	ip_v:4;			/* version */
	unsigned int	ip_hl:4;		/* header length */
#else
#	error unknown ENDIAN type
#endif
	unsigned int	ip_tos:8;		/* type of service */
	unsigned int	ip_len:16;		/* total length */
	guint16		ip_id;			/* identification */
	guint16		ip_off;			/* fragment offset field */
	guint8		ip_ttl;			/* time to live */
	guint8		ip_p;			/* protocol */
	guint16		ip_sum;			/* checksum */
	struct in_addr	ip_src, ip_dst;		/* source and dest address */
};
Many developers cheat and used unsigned char but it's non-standard and raises warnings under GCC.

MrMoo
Sep 14, 2000

Sweeper posted:

Hello again! I am trying to do some performance tests of my code so I went snooping around my My Book to get some example times.

It says that on a 250MHz MIPS R10000 int = int + int takes 12 ns (page 185 for those of use with the book).

So I thought I would test my timing using an int+int. Unfortunately I get 166 ns. So I was wondering if there was a way to reduce overheard or maybe I am doing something wrong.

Here is the relevant code for testing times:
code:
/*+*/
clock_gettime(CLOCK_REALTIME, &start);
i3 = i1 + i2;
clock_gettime(CLOCK_REALTIME, &finish);
	
cout << "For +: " << finish.tv_nsec - start.tv_nsec << endl;
Am I way off on this?

Running on what OS? Usually clock_gettime() is overly inaccurate and secondly instruction re-ordering can make the start and finish calls appear at completely different locations to how you laid them out.

Try adding memory barriers, or evaluating the time over a larger set of tests and averaging the result. Try and find a higher resolution clock source like the core's instruction count.

MrMoo
Sep 14, 2000

ultra-inquisitor posted:

How does JIT compilation work, in practice? Once you have an array of machine code, how do you load it into memory and tell the OS to use it?

Call it as a function, functions are just pointers after all.

MrMoo
Sep 14, 2000

code:
typedef struct Node Node;

struct Node
{
   int count;
   struct Node *next;
};
You cannot use the typedef within the declaration, is there any logic behind this limitation?

MrMoo fucked around with this message at 08:44 on Dec 3, 2009

MrMoo
Sep 14, 2000

RussianManiac posted:

I have a quick gprof question. I profiled my application and it shows 98% usage in one function, but it doesn't really show what calls within that function are taking up most time. How can I focus gprof just on that function and also tell it some recursion level of call stack deepness to analyze?

Run with oprofile first, but generally your compiling might be re-organizing the code order so it might be difficult to pinpoint down without breaking up the function to smaller units.

MrMoo
Sep 14, 2000

RussianManiac posted:

Is it ok to post my code here for you guys to analyze why its slower than single threaded version?
Thread synchronization is very expensive, try to avoid it at all costs if you want the highest performance.

quote:

Since I have to lock reports when I read or write to them, I dont do mutex_lock in inner loop, but mutex_trylock.

Consider read-write locks if you have a clear distinction between threads.

MrMoo
Sep 14, 2000

Ouch, this code is nasty, it's not clear what is going on and the threads and locking are all over the place.

code:
pthread_mutex_lock( &main_thread_lock );
{
	//os<<"unlocking helper threads"<<endl;
	FOREACH( arg, correlate_args )
	{
		pthread_mutex_unlock( (*arg)->lock );
	}
	//os<<"Going to sleep until helper threads are done... "<<endl;
}
pthread_mutex_lock( &main_thread_lock );
pthread_mutex_unlock( &main_thread_lock );
Weeeee,

MrMoo fucked around with this message at 04:40 on Dec 12, 2009

MrMoo
Sep 14, 2000

The entire algorithm seems to be running sequential, so it's slower because it's basically the single threaded version with extra locking and synchronization added.

code:
// evaluate each report
for( int i = arg->a; i <= arg->b; ++i )
{
	radar_report_t ** report = &(*arg->reports)[i];

	pthread_mutex_lock( (*report)->lock );

	FOREACH( track, *arg->tracks )
	{
		pthread_mutex_lock( (*track)->lock );
...
Please use more functions and meaningful names, split the code up, a lot. A lot more of the code should be high level logic than low level locking, and terse checks.

MrMoo
Sep 14, 2000

volatile should pretty much only be used when talking with hardware, there's pretty much no excuse for modern application programming.

code:
pthread_mutex_lock( &threads_ready_lock );
{
    ++threads_ready;
}
pthread_mutex_unlock( &threads_ready_lock );
This can be replaced with an atomic inc.

MrMoo
Sep 14, 2000

Avenging Dentist posted:

Libraries:
Boost

Please add GLib to this list, it's very nice and reasonably cross platform when compiling C, more importantly it's tested, regularly updated, and has a significant install base. Nothing more depressing than wading through buggy self-rolled container and threading APIs.

MrMoo
Sep 14, 2000

Avenging Dentist posted:

I was going to say "yeah but it's GNOME", but apparently I already linked to the GTK, so...

The nice thing about GLib is that it has nothing to do with the GObject and GType behemoths that make developers cringe about Gnome and Gtk.

MrMoo
Sep 14, 2000

First check argv[0] to see if the name specified exists from the programs starting current directory. If it does then you can realpath() it to find the absolute path. If it doesn't exist you need to wander through the environment PATH to find it, for example using g_find_program_in_path().

For Win32, check here for discussion on realpath() implementation.

MrMoo fucked around with this message at 06:34 on Jan 13, 2010

MrMoo
Sep 14, 2000

You might find _ftime() to be more friendly on Win32/Win64.

Timing in general is surprisingly poor on all Intel/AMD platforms. Non-monotonic clocks, different clocks on each core, the schizophrenia of Hyper-Threading, and of course the lack of high resolution sleep functions. Anything under 2ms and you should really consider a delay loop as context switches and kernel interactions are expensive.

Timing on Windows is generally 16ms resolution, Linux <4ms. Performance counters use the core frequency, use process affinity to keep using the same counter.

MrMoo
Sep 14, 2000

Avenging Dentist posted:

No one uses GetTickCount for high-precision timers.

A lot of people use TSC for high precision timing: finance, messaging, games, etc :confused:

MrMoo
Sep 14, 2000

Avenging Dentist posted:

GetTickCount doesn't use the TSC (at least, it doesn't just return it directly).

You brought up GetTickCount, presumably an arc about which Windows timer API calls have a low resolution.

gettimeofday and Windows equivalents are still relatively expensive operations, compared with RDTSC due to attempts to make a stable system-wide timer.

Anyway, found a nice article detailing the state on Win32,

http://www.geisswerks.com/ryan/FAQS/timing.html

MrMoo
Sep 14, 2000

FYI: Just testing HPET on Linux, on a ICH6 family mobo (~14Mhz HPET) it's HPET takes about 500ns to read via mmap(), avoiding system call overhead.

TSC posted:

check-point-01: 21411145145 (0us)
check-point-02: 21411145145 (0us)
check-point-03: 21411145145 (0us)
check-point-04: 21411145145 (0us)
check-point-05: 21411145145 (0us)
check-point-06: 21411145145 (0us)
check-point-07: 21411145145 (0us)
check-point-08: 21411145145 (0us)
check-point-09: 21411145145 (0us)
check-point-10: 21411145145 (0us)

HPET posted:

check-point-01: 161057269 (1us)
check-point-02: 161057269 (1us)
check-point-03: 161057270 (2us)
check-point-04: 161057270 (2us)
check-point-05: 161057271 (3us)
check-point-06: 161057271 (3us)
check-point-07: 161057272 (4us)
check-point-08: 161057273 (5us)
check-point-09: 161057273 (5us)
check-point-10: 161057274 (6us)

(edit) And of course Ubuntu being bun they leave the /dev/hpet read-only to root, yet another installation capability to set.

MrMoo fucked around with this message at 10:16 on Jan 19, 2010

MrMoo
Sep 14, 2000

How about basic event management, I thought select() on Linux was bad with 1-2ms resolution on timeout, but I cannot get Win32 to wait less than 1000ms with WaitForMultipleObjects(). Is it really that bad?

MrMoo
Sep 14, 2000

MrMoo posted:

Anyway, found a nice article detailing the state on Win32,

http://www.geisswerks.com/ryan/FAQS/timing.html

Microsoft's article on different timer hardware and Windows support:

http://www.microsoft.com/whdc/system/sysinternals/mm-timer.mspx

HPET replaces the ACPI timer in QueryPerformanceCounter() for Vista+.

MrMoo
Sep 14, 2000

functional posted:

Looking for 64-bit portable C versions of srand, rand, and RAND_MAX.

See what GLib does underneath it's API:

http://library.gnome.org/devel/glib/stable/glib-Random-Numbers.html
http://www.koders.com/c/fid54B0475BEE58D1CECC96F282C9F9A6D91E8640A3.aspx

MrMoo
Sep 14, 2000

lol 10 bux posted:

Is it possible for read/write to a regular file to ever successfully return less than the bytes requested, barring possibly being interrupted by a signal, if I know the size of the file?

It could be truncated by another process, maybe NFS errors could cut the read short too.

MrMoo
Sep 14, 2000

It would be Purify, commercial platforms means commercial tools :(

MrMoo
Sep 14, 2000

I personally prefer Scons, although CMake looks pretty when running. I can only guess Google threw it out because it doesn't scale well with a large number of dependencies or that they encounter some cases where stuff needs to be built from scratch and they have a parallel build farm such that compilation speed doesn't matter.

Old versions of CMake are utter poo poo and need way too many escape sequences, Make and autoconf are just loving awful. OpenOffice.org is a shining example of why Make is poo poo, yay 20 minutes to run a dependency analysis for no changes.

Scons and CMake can be annoying when they don't support your build tools, for instance Intel C Compiler Suite or Sun Studio are broken on Linux. Try something fruity like MinGW-w64 or wine-gcc and they're a colossal turd.

Example crap with CMake is trying to make a PIC enabled static library, example crap with Scons is MinGW anything, example crap with Make is language dependencies.

MrMoo fucked around with this message at 07:15 on Mar 6, 2010

MrMoo
Sep 14, 2000

w00tz0r posted:

Does anyone know if Microsoft modified struct in_addr in berkeley sockets for their windows 7 implementation?

Headers don't change with Microsoft OS releases only MSVC releases, the only thing that changes is the available functions which will raise errors when you start the application or when you try to load them, e.g. WSARecvMsg().

quote:

I'm trying to join a multicast session, but all the examples I see are using the ip_mreq structure, where in_addr contains a sin_addr as a member.

On Windows the alternative is to use group_req and MCAST_JOIN_GROUP as they are IP family agnostic but support is limited to Windows Server 2008. I'm sure it used to list Windows 7 client support too, MSDN might have hosed up with the new design,

http://msdn.microsoft.com/en-us/library/bb427440%28VS.85%29.aspx

MrMoo
Sep 14, 2000

Zakalwe posted:

More on why STL can suck for games here http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html

Summary: EA ended up writing their own STL-like collection of data-structures and algorithms for in-house use.

An Trolltech followed suited for Qt for pretty much the same reasons.

MrMoo
Sep 14, 2000

floWenoL posted:

No they didn't? IIRC, Trolltech rolled their own because they for some reason just liked Java iterators more than C++ ones.

They have a list of reasons, but memory was the main one I saw,

Trolltech posted:

Whereas STL's containers are optimized for raw speed, Qt's container classes have been carefully designed to provide convenience, minimal memory usage, and minimal code expansion. For example, Qt's containers are implicitly shared, meaning that they can be passed around as values without forcing a deep copy to occur. Also, sizeof(QXxx) == sizeof(void *) for any Qt container, and no memory allocation takes place for empty containers.

http://doc.trolltech.com/4.0/qt4-tulip.html

MrMoo
Sep 14, 2000

Sweeper posted:

Where do you guys declare your variables? I always declare them at the top of the block they are used in, but someone was telling me that proper style dictates that you declare them where you first use them?

C89 style is at the head of the block, C99 and common sense style is to declare them close to where you use them. Any good compiler will automatically move the storage to the correct place and declaring later gives better hints to the compiler about the scope and lifetime.

MrMoo
Sep 14, 2000

Avenging Dentist posted:

Huh? I'm pretty sure most compilers will only create activation records at the function level, not at the block level (and certainly not on a per-declaration level), in order to minimize manipulation of the stack frame.

If we're talking basic C data types the compiler can keep a variable in a register and never have to move the stack. The location of declaration can help determine whether a variable should be memory or register based.

MrMoo
Sep 14, 2000

quote:

A 5 character null terminated string in C takes exactly 6 bytes of memory.

quote:

A 5 character string in C takes exactly 5 bytes of memory.

Developing with messaging systems you unfortunately learn that some people really do use non-null terminated strings.

MrMoo
Sep 14, 2000

pseudorandom name posted:

Also, you'd almost certainly be better off using the ImageMagick library directly instead of executing the programs yourself.

I saw Alfresco does this from it's Java framework, people make some weird design decisions. I'm guessing said designers or developers never have to use or integrate their output; everyone else knows the less external dependencies the better. Java has it's own graphics API, I'm sure there is some equivalent for ImageMagick already created.

MrMoo
Sep 14, 2000

In a single threaded application how can MSVC 2008 and 2010 show different values for LHS & RHS of a simple assignment?

MrMoo
Sep 14, 2000

I have no idea how to debug it, the function is in a static C library, the application is a trivial C++ one filer. Performing the same assignment inside the end of the pgm_getaddrinfo() call returns the expected value (2).

The code works fine under everything else, Linux, FreeBSD, OSX, Solaris, even Wine, and Windows with MinGW and MinGW-w64 builds.

http://code.google.com/p/openpgm/

:negative:

It's not a stray #pragma pack, it's not differences of ADDRESS_FAMILY, I can try forcing NT version in the build as for MinGW I used 0x501.

MrMoo fucked around with this message at 04:57 on Aug 6, 2010

MrMoo
Sep 14, 2000

Changing all the code from C89 to C++2003 didn't help either apart from time wasting.

Moving all the static library code into the application project did resolve the issue. Something fruity with MSVC static libraries?

MrMoo
Sep 14, 2000

Magic from the WIN32 definition apparently. MSVC adds it to console applications but not static library projects. Winsock2 stuff compiles without it but apparently is broken.

What a fantastic platform.

:negative:

MrMoo
Sep 14, 2000

litghost posted:

There are probably ready made solutions you could use too.

Picking one at random: http://code.google.com/p/pthread-lib/

You could also use GLib which has thread pool support: http://library.gnome.org/devel/glib/unstable/glib-Thread-Pools.html

MrMoo
Sep 14, 2000

litghost posted:

Worth noting that pthread-lib hasn't had any work done on it since 2008. Of course it could have zero bugs! Oh and has little documents, and no examples.

I saw a March 2010 update and thought it was still going, but that was a performance improvement suggestion.

MrMoo
Sep 14, 2000

slovach posted:

What is with MSVC and SSE intrinsics? MS recommends their usage over inline asm, but the stuff it seems to generate is beyond earthly logic at times.

Ok, _mm_set stuff... why would it honestly make 4 movss instructions over one movaps and a constant? Occasionally it seems to come up with some shuffling black magic out of nowhere...
Most likely alignment issues, optimal speed for SSE moves requires 16 byte alignment.


Painless posted:

Oh fuckballs why is that horrible variable-length stack-allocated array GCC extension enabled by default?

You mean C99 dynamic arrays? Why would you consider it horrible, you save a function call over using alloca()?

MrMoo fucked around with this message at 16:22 on Sep 6, 2010

MrMoo
Sep 14, 2000

Sun ONE Studio and GCC support the most useful bits with no problems. It's like complaining nothing supports C++ 2003 completely, nobody cares as the full spec always has stupid parts in it.

MrMoo
Sep 14, 2000

You want me to mention xlc & acc, how about Open Watcom, Digital Mars, or even Open64?

:woop:

Adbot
ADBOT LOVES YOU

MrMoo
Sep 14, 2000

C99 to C++, what is the best way to convert strict aliasing safe casting?

Common code
code:
struct gsi_t {
        uint8_t b[6];
};

struct tsi_t {
        gsi_t gsi;
        uint16_t s;
}
C99 code that performs simple comparison, taking pointers
code:
bool
tsi_equal (
	const void* restrict p1,
	const void* restrict p2
        )
{
	const union {
		tsi_t		tsi;
		uint32_t	l[2];
	} *restrict u1 = p1, *restrict u2 = p2;

	return (u1->l[0] == u2->l[0] && u1->l[1] == u2->l[1]);
}
C++ code to perform less than comparison, taking references
code:
bool tsi_less (const tsi_t &ltsi, const tsi_t &rtsi)
{
#if 0
        uint32_t lu[2], ru[2];
        memcpy (lu, &ltsi, sizeof(lu));
        memcpy (ru, &rtsi, sizeof(ru));
#else
        const union {
                tsi_t ltsi_;
                uint32_t lu[2];
        };
        const union {
                tsi_t rtsi_;
                uint32_t ru[2];
        };
        ltsi_ = ltsi;
        rtsi_ = rtsi;
#endif
        return (lu[0] < ru[0]) || (lu[0] == ru[0] && lu[1] < ru[1]);
}
memcpy or unions, or is there a method using reinterpret_cast? This is deliberately avoiding lexicographical_compare.

MrMoo fucked around with this message at 04:37 on Sep 16, 2010

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply