Today (Sat, 22 Aug) marks the start of the fasting month for Muslims. So to those celebrating*, Happy Fasting, and may the Holy Month of Ramadhan shower blessings. As they say in Indonesian, Selamat menunaikan ibadah puasa. Mohon ma’af lahir dan bathin. The second part of the sentence is a request for forgiveness, in the physical and in spirit, which I wholeheartedly extend for any mistakes I may have done. Generality is a core programming tenet I like, so I extend this request to everyone, not just my Muslim friends celebrating :P.
- I don’t, but I usually show up after the fast at sunset as the yummy food is prepared.
I’ve been messing with Groovy – truly – the name aside – it is groovy, and I am impressed.
At some point I’ll say more about it, or dump out various code snippets I’ve been experimenting with to learn it, but for now I wanted to ensure I have this snippet recorded for reference – it is on closures and their context in Groovy. It took me a while to get it. It is always interesting how closures are defined within the context (no pun intended) of the JVM, where functions are not first-class types.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
|
/* * For a closure: basically local vars are the ones passed in {x,y,z -> blah}, everything * else is picked up and searched for from the execution context. Internally generated as * innerclasses with a supplied context consisting of 'this', owner, and delegate (the * internal Groovy closure base type is: groovy.lang.Closure -- see its Javadoc, this is * what closure blocks get translated to). * * Method/property resolution order for unqualified methods within a closure is in the * following order (and defined by the following context vars available to the closure): * * 1. this -- execution context - object to which the closure is first bound, e.g. script * or class where defined, NOT the closure instance itself! * * 2. owner -- usually equal to this, except when a closure (itself implemented as an * innerclass) then creates an inner closure, that inner closure's 'this' is the outer * closure instance. So that's one way to get access to a closure itself - create a * subclosure and mess with it's 'this'! * * 3. delegate -- usually set to owner, but can be modified (others are fixed), for * metaprogramming purposes * * For better illustration check the following snippet */ enclosingInstance = this outerclosure = { // by default, 'this' is always execution context (object enclosing the closure), never this // closure instance (which is implemented as groovy.lang.Closure (.doCall)) assert this == enclosingInstance // And so is owner assert owner == this // this.owner throws a MissingPropertyException, since the outer script has no 'owner' property, // and 'this' is the outer script, not the closure instance! // But if the closure (outerclosure) creates another closure like so... innerclosure = { return this; } // The inner closure's owner is the outer closure instance! assert innerclosure.owner == outerclosure // However the inner closure's 'this' (returned) is still the initial enclosing instance, not the // outer closure. // Not quite sure if this is intuitive, hmm. assert innerclosure() == enclosingInstance } // run it! outerclosure() // check out it's superclass -- groovy.lang.Closure println "superclass type: "+outerclosure.getClass().superclass // done.
|
Type that out into some file say Foo.groovy and run it (yes you don’t need a static main() method in an enclosing public class with the same filename). The example is focussed on context, it does not provide samples of parameterised closures, or even parameterless ones (technically the closure above has one default parameter called ‘it’ which is null unless otherwise set in the call to the closure :P).
Warning: Colourful language ahead. (My colours are way duller than most though, so your mileage may vary.)
Every now and then I have episodes of deep reflection on languages and semantics, and not just programming languages either. A common phrase for one speaking junk or bullshit is to ‘speak out of one’s @$$/arse/{insert other posterior synonym} (henceforth aliased to the less-accurate-but-will-do term $POSTERIOR in the interest of the DRY principle)’, or ‘did you just pull that out of your $POSTERIOR’, and so on. In my ever so humble view, these phrases and their variations should be used rather carefully and I am not simply looking at it from the viewpoint of manners and aesthetics either. Let’s consider a few comparisons:
The excretory organs, including parts involved in the aforementioned $POSTERIOR expel toxins and unused junk out of the body, ensuring normal functioning of the digestive system, and in fact the body as a whole all things considered – you are what you eat and all that. In many cases, when one speaks out of one’s $POSTERIOR, it is often a trait that is repeated, because one is still evolving, as we all are, or perhaps has chosen not to evolve – also a choice made by many. Neither good nor bad; it just is, no judgement (no, really). The point is that more often than not, this wannabe-$POSTERIOR produce is not expelled for good, rather its source is often more like a bottomless pit (no pun intended.. well maybe just a little).
An astute reader (like yours truly, who just thought of this, teehee) will also bring up that even in the case of the true $POSTERIOR, it can be a bottomless pit – for one keeps eating and recycling, more so if the intake is … excessive – but the crucial invariant here is that output is always less than or equal to input (in fact equal is quite unlikely I think?) for true $POSTERIOR. Contrast this with case of speech or ideas ejected from wannabe-$POSTERIOR: even without additional intake (i.e. no new incoming less-than-valuable ideas to process), the junky output is sometimes reduced, often remains constant, but usually increases. On the rare occasion, it is eliminated. Quite a different invariant, yes?
Hence these phrases make use of flawed comparisons, i.e. wannabe-$POSTERIOR <> true-$POSTERIOR, they are not even all that similar.
These phrases in fact do a disservice to the true $POSTERIOR. They give $POSTERIOR a bad name. The $POSTERIOR works in all ernest supporting life. It is a Divine gift (have you considered life without it?). The bullshit output via the wannabe-$POSTERIOR, on the other hand, quite simply, does not necessarily do the same.
I shall however submit that the outputs (wannabe-output vs true-output) share many more traits, and are worthy of comparison. But let us not discredit true $POSTERIOR unnecessarily.
Please consider the ideas put forward in this post the next time you decide to use phrases involving $POSTERIOR.
Thank you. I wish you, and your $POSTERIOR, fragrances of heavenly descent.
PS. I have also tagged this as software_dev, for I think they kinda explain invariants and DRY rather nicely.
It is not often that I write about performance – okay more like never, and more so about PHP performance! But this is a must share methinks. On a personal project, I have been forced to use PHP for various reasons. Being rooted very much in the Java space, I went on a hunt for reusable stacks. I finally settled on:
- Kohana for the MVC framework.
- Doctrine for the ORM layer, since Kohana’s ORM annoyed me no end. Having worked with Hibernate in Java land, I was offended that Kohana called their implementation ORM – apologies folks, I am very biased here, the MVC bit, module system, cascading file layout, and hook system of Kohana is otherwise neat.
In the interest of ease of hosting and the existence of a vast knowledgebase from PHP’s standpoint, I settled on MySQL for the database. And of course, I use Linux, Debian in particular.
Anyway, Kohana tries to be extremely shared-hosting friendly, so it does not require having a long-running FastCGI process for example (not sure if there’s a recommended FCGI set-up), everything is loaded dynamically as per needed, no preloading of files in memory. Basically, every request, it loads up a bunch of PHP files, any startup hooks, routes the request to your controller, and your controller in turn loads up one or more classes to get its job done – DB queries, etc., you name it. For those of you in Java space, think of it as loading your servlet context at each request (yes, you heard me right) – somewhat of an exaggeration of course, but with enough stuff to load, it may very well be a suitable analogy, and in some ways worse — the source file is interpreted each time, no bytecodes cached by default (yet!). Even with this overhead, it is quite fast, very impressed.
But then I hooked in Doctrine using the integration module kindly provided here (had to upgrade the Doctrine version internally, but the Kohana hook points did not require changes, cool stuff). Now, Doctrine is a full-featured ORM, so it does have its overheads, it also features DQL – the Doctrine Query Language (inspired by HQL), which means for every DQL statement, it first parses it to the target SQL before execution. This caused each request to become a memory hog, from around 1-3 MiB per request prior to Doctrine usage (as output by the Kohana renderer, which in turn uses PHP’s builtin memory_get_usage function), it was now in the 8-12 MiB range per request – consistently!
The Doctrine documentation (v1.1 used here) has a rather decent section on performance, there were 3 main infrastructure-y (as opposed to application code) recommendations:
- Use a bytecode cache
- Use the doctrine query cache with an appropriate driver
- To minimise I/O due to multiple file inclusions, compile the Doctrine framework with a provided compile() function to get one large merged file encapsulating most (all?) of the Doctrine framework
Taking the above one at a time:
There are a few bytecode caches around, but one that caught my virtual fancy was the Alternative PHP Cache (APC) – mainly because it seemed to be the easiest to install (on Debian, aptitude install php-apc) and most well-integrated – I feel that bytecode caching should be a default feature included with the runtime, and the buzz on the net seems to indicate that APC is more or less marked for inclusion by default into future PHP installs (but I could very well have misinterpreted the buzz :P). APC uses shared memory (shm) segments to cache PHP bytecodes, which while requiring some dedicated memory (duh), also makes it blazing fast. It does not, at least by default – have yet to explore – cache on disk (contrast this to Python .pyc files). I have not tuned any APC parameters, in my default install (which checks for source file changes to determine re-caching), it has literally brought the memory usage back down to 1-5 MiBs per request, with an average of 2.5 MiB, the higher end of the scale seems to occur when I load object graphs rather than associative arrays (another Doctrine recommendation, prefer array hydration over object hydration if you do not need the business logic on the objects).
As my friend Tom would say, I am a happy camper! The app feels snappy!
So yeah – PHP bytecode caches can make a HUGE difference! And the default settings for APC such as checking source file timestamps are perfect for development, I’ll have to check if tuning this setting makes any significant difference. A couple of downsides I can think of to the shm approach (again haven’t check tuning params whether APC does disks, etc.):
- If you disable the source file checking, you’re potentially going to have to restart the process that allocated the shared memory if using a persistence process, I’ve only used FastCGI, not so sure how it affects mod_php with Apache
- Also not sure how using dedicated shm for a process goes with shared hosting providers!
Prior to using the bytecode cache, I had enabled the Doctrine query cache, using the SQLite driver – so an SQLite database (direct file-access-based DB, no dedicated server process) is used as a cache, and this actually increased my memory usage and response times :P. Dang. Essentially what the query cache does is prevent the re-parsing from DQL to SQL each time, it uses the DQL as a key into the cache (or so I think that’s how it should work!) – effectively the query cache is a cache of prepared statements. However, Doctrine also comes with an APC driver for the query cache (another reason to use APC!), so once I had APC enabled, I replaced the query cache SQLite driver with the APC driver, not bad, it saves a further 0.3 to 0.5 MiBs per request!
Unfortunately, the compiled (merged to be more precise) Doctrine PHP file actually did not help me at all, it increased memory usage to about 7-8 MiBs after bytecode caching, before that it would’ve easily spiked to the 20 MiB range! Another thing I noticed was that I had to run compile() several times to create Doctrine.compiled.php (that was later included in lieu of just Doctrine.php) since it kept running out of memory. I had to increase the memory limit for a script from about 30MiB to around 100MiB for compile() to successfully complete and produce the merged file. Considering the number of files to merge, and it probably did this naively by loading all or most of them in memory and writing it out as a whole (a guess here), it is not surprising I suppose. The file produced is too large I think – in effect we killed lazyloading of classes by forcing a big read. And yes, I made sure I wasn’t reading BOTH doctrine.php and doctrine.compiled.php (it would not work anyway, we get class redeclaration errors!). Hmm, wonder if the recommendation was made for a set-up where the Doctrine init stuff was maintained in memory across requests.
To summarise:
- The APC PHP bytecode cache kicks the proverbial @$$ so hard it hurts. Install it!
- Doctrine’s query cache is neat, but so far only with APC – using SQLite may in fact be detrimental (it is after all yet another File I/O operation, opening a connection, etc. as opposed to an in-memory APC call).
- The merged Doctrine PHP file actually made things worse!
Phew, hope that was useful!
Update Sat 2009-08-01:
Brain dump: I was thinking some more about the compiled Doctrine file… logically it should be faster because APC would put it into memory, and then simply just check one PHP file for timestamp updates to decide whether the cache should be invalidated for it. So why is it slower and used more memory; actually memory usage seems justified just not that much more? I haven’t confirmed but I wonder if the bytecodes could not fit into the cache… and therefore had to be read from the file each time… and since there’s no lazy loading (the whole file is interpreted at once — the whole Doctrine framework!), it’s hungrier? Hmm, will confirm when I am inclined to.
My friend Tom – who does not have a web presence unfortunately (hint hint, Tom) – sent this video link. It is one of the funniest things I’ve seen in a while, top-grade geek humour :-). Note the last sentence in the video. Classic!
Disclaimer: In case this gives the wrong impression – I am a fan of Agile methods, the emphasis on getting quality, well-tested code up and running is breath of fresh air; it takes a certain amount of discipline and culture though – the latter being more crucial in my view, and not always available. Humour like this really brings that much-spoken-about Real World to light :P.
Just wrote and uploaded FormGen, a quick and dirty HTML form generator, in the code section. Haven’t uploaded sources yet though. Check it out!