Previous Thread
Next Thread
Print Thread
++include'once compile performance question #37321 24 Apr 24 02:04 PM
Joined: Nov 2006
Posts: 2,223
S
Stephen Funkhouser Offline OP
Member
OP Offline
Member
S
Joined: Nov 2006
Posts: 2,223
About 15 years ago we were using ++include'once, but the nested limit kept being us. To remove that as an issue we stopped all use of ++include'once, and now use ++IFDEF inside the includes. I think this is causing unnecessary slowness in the compiler because it must scan every file regardless of it already being included.

I think the ++include'once nest limit no longer exists, is this correct? If so, I'll be interested to see how much faster we can compile wtth ++include'once.

Does anyone have any experience with the difference?


Stephen Funkhouser
Diversified Data Solutions
Re: ++include'once compile performance question [Re: Stephen Funkhouser] #37322 24 Apr 24 03:31 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline
Member
Offline
Member
J
Joined: Jun 2001
Posts: 11,794
Interesting question. Actually pair of questions. On the first one, there's no known limit on the use of ++include'once. But since you're compiling with the /igoo (Include Global Only Once) switch, you're effectively already using ++include'once everywhere. So I don't think removing your ++IFNDEF statements bracketing each include will make any difference.

As an aside, I've settled on the pattern of always using ++include'once (in fact, it's an APN shortcut triggered by "inc"), but I still add the ++IFNDEF directives inside most include files anyway -- partly out of habit, but also on the theory that it provides additional flexibility. For example, in cases where I may want to test multiple versions of a function, I can use the /C:symbol=value compiler switch to explicitly DEFINE the symbol associated with a the relevant include file to prevent it from being processed even once.

But setting all of that to the side to address the second question about how much overhead there is in scanning a file during the first compiler pass (necessary to process the conditional directives), it's definitely "something", but probably not "a lot". I'm guessing that on average, the first pass is about 3-4 times faster than the second pass, since it only needs to pay attention to ++ directives, DEFINES, function/procedure declarations and calls, and now perhaps DEFSTRUCTs. And it doesn't actually have to 'compile' any of it. It's mainly just building the matrix of called functions so that we shake out those that aren't called, and can know in advance the named parameters for those that are. But perhaps I should add some timing statistics to the compiler output to actually measure that.

While on the subject of minor refinements/optimizations, I noticed that you tend to use LOOKUP as a way to conditionally include a file depending on whether it exists, e.g.
Code
++IF (LOOKUP("in:invprchist.sdf") # 0)
   ++INCLUDE in:invprchist.sdf
++ENDIF

There's nothing wrong with that, and doesn't add more than a few extra picoseconds of overhead to parse the statement, but just from a syntax simplification perspective, my preference would be for:
Code
++INCLUDE'IF'EXISTS in:invprchist.sdf

or
Code
++INCLUDE'ONCE'IF'EXISTS in:invprchist.sdf

It just seems a little cleaner.

Re: ++include'once compile performance question [Re: Stephen Funkhouser] #37324 24 Apr 24 04:35 PM
Joined: Nov 2006
Posts: 2,223
S
Stephen Funkhouser Offline OP
Member
OP Offline
Member
S
Joined: Nov 2006
Posts: 2,223
I forgot about the /IGOO switch. I was hoping for an easy performance win. Compilation for PRCCTDSP program recently went from 8-9 seconds to 20+. We thought it was related to moving to a 64-bit A-Shell, but it is still that slow back on 6.5. Must have been a change we made.

Problem for another day I suppose. Thanks for the through response.


Stephen Funkhouser
Diversified Data Solutions
Re: ++include'once compile performance question [Re: Stephen Funkhouser] #37325 24 Apr 24 09:11 PM
Joined: Jun 2001
Posts: 11,794
J
Jack McGregor Offline
Member
Offline
Member
J
Joined: Jun 2001
Posts: 11,794
For what it's worth, along with the fix for the ++IFDEF <structname> issue (discussed here), I've added the following stats to the end of the LSX file in the hopes of perhaps making it easier to identify compiler performance issues, or at least quantify them. (This is from your prccstdsp program compiled under Windows) ...
Code
Performance Statistics *:
======================================================
Phase 0 (/p, /px prescan):        3.2812 secs
Shake out unused routines (/px):  0.0312 secs (3614 kept, 7717 dropped)
Phase 1 (main compilation):       2.8906 secs
Phase 2 (emit RUN code):          0.1406 secs
Total elapsed time:               16.4653 secs

Labels/Functions: 99984 searches, 312.5000 ms
Symbol Definitions: 1150955 searches, 4218.7500 ms
Variable Definitions: 261632 searches, 937.5000 ms
Structure Definitions: 15041 searches, 0.0000 ms

* Total elapsed time should be accurate. All other time values are based on
  the Windows kernel + user CPU clock counters, theoretically measuring
  CPU rather than elapsed time, but may not be precise enough to measure
  individual searches accurately.

As the asterisked note indicates, unfortunately, measuring the time used in routines that take almost no time per call, but which get called tens of thousands of times, potentially adding up to real time, is not very reliable when the time for a single call isn't significantly longer than the clock resolution. The Windows CPU clocks use 100-nanosecond units, but it's not clear whether the actual resolution is that fine, or whether, for example, each search for a structure name in the structure definition index takes significantly less than that, thus making it look like the 15041 searches took no time at all. Also, the total CPU time for the 4 search routines itemized is very close to the total CPU time for the entire compilation, which is ridiculous, because there's a lot of other code in the compilation that wasn't being separately itemized. So that strongly implies that whether or not a routine gets properly measured is a bit like Russian Roulette (either it gets off for free because the clock didn't change, or it gets hit with a big charge, most of which should have gone against prior routines.

Another detail which stands out is that the initial pre-scan (Phase 0) actually took longer than the main compilation (Phase 1), which surprised me. Partly that may be because the pre-scan had to look at 3 times as many routines as the main scan. And partly it may be because the pre-scan is having to build these various index trees, offloading some of the work from the main scan (which mostly then was doing a lot of searches of those trees). Or maybe there is some inefficiency there which needs to be squeezed out. But here we can't blame the clock resolution because it's measuring once at the start and once at the end of each phase (which are plenty long enough to make the clock resolution a non issue).

Another surprising result is that the total elapsed time is so much more than the CPU time. That suggests that the OS devoted almost 2/3 of its time to other background processes. (Not entirely out of the question, especially with Windows, but still a bit surprising.)

Anyway, I'll probably post the update later tonight or tomorrow, but now that you have your workaround, and considering how apparently rare it is for the original problem to actually be triggered, it's not clear there's any urgency.


Moderated by  Jack McGregor, Ty Griffin 

Powered by UBB.threads™ PHP Forum Software 7.7.3