XPath regular expression matching in PHP 5.3

Recently I needed to do some text pattern matching in an XML XPath query, and XPath's built-in sub-string matching capabilities were not good enough.

While XPath 2.0 defines regular expression matching capabilities, it is still not widely implemented and in most available tools there is no easy way to do complex pattern matching on XML nodes.

Or is there?

In his blog Thomas Weinert recently gave an intro to using DOM and its XPath capabilities in PHP, but one of the cool features of DOM's XPath, available starting from PHP 5.3.0 (have you upgraded yet?), is that the DOM extension supports registering pretty much any PHP function with the XPath engine, and using it inside XPath queries.

Here is a quick example showing usage of PHP's own preg_match() in an XPath query, to find all the external links in Wikipedia's PHP article:

PHP:
  1. // Supress XML parsing errors (this is needed to parse Wikipedia's XHTML)
  2. libxml_use_internal_errors(true);
  3.  
  4. // Load the PHP Wikipedia article
  5. $domDoc = new DOMDocument();
  6. $domDoc->load('http://en.wikipedia.org/wiki/PHP');
  7.  
  8. // Create XPath object and register the XHTML namespace
  9. $xPath = new DOMXPath($domDoc);
  10. $xPath->registerNamespace('html', 'http://www.w3.org/1999/xhtml');
  11.  
  12. // Register the PHP namespace if you want to call PHP functions
  13. $xPath->registerNamespace('php', 'http://php.net/xpath');
  14.  
  15. // Register preg_match to be available in XPath queries
  16. //
  17. // You can also pass an array to register multiple functions, or call
  18. // registerPhpFunctions() with no parameters to register all PHP functions
  19. $xPath->registerPhpFunctions('preg_match');
  20.  
  21. // Find all external links in the article 
  22. $regex = '@^http://[^/]+(?<!wikipedia.org)/@';
  23. $links = $xPath->query("//html:a[php:functionString('preg_match', '$regex', @href)> 0]");
  24.  
  25. // Print out matched entries
  26. echo "Found " . (int) $links->length . " external links\n\n";
  27. foreach($links as $linkDom) { /* @var $entry DOMElement */
  28.     $link = simplexml_import_dom($linkDom);
  29.     $desc = (string) $link;
  30.     $href = (string) $link['href'];
  31.    
  32.     echo " - ";
  33.     if ($desc && $desc != $href) {
  34.         echo "$desc: ";
  35.     }
  36.     echo "$href\n";
  37. }

Note the use of php:functionString() as an XPath function, calling preg_match(). functionString() will pass XML entities such as @href as a string into the function, which is different from calling php:function() which, as far as I have seen, will pass parameters without casting them to a string first (however I am not sure what exactly they are passed as... maybe someone who knows can elaborate?).

Pretty useful huh?

Utopia in the header file

This is from the top of sqlite3.h, the header file for the SQLite3 library - most source file would have a copyright notice here referring people to read their license, but since SQLite is public domain, the author decided to put this instead:

** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.

I have to admit I find this inspiring. For me, it is a strong reminder that dealing with legal limitations (on software and any other form of "intellectual property") is at best no more than a necessary evil. That goes for free software licensing as well.

Experimenting with Glista on OS X

I haven't blogged in a while, probably because I was too busy. I've been working, started to take some university classes (Philosophy & Computer Science), and... I'm doing most of my work on Mac OS X now. Don't worry, I'm still a Linux guy - but mostly for work purposes (and out of curiosity) I decided to ask Zend for a Macbook when my Thinkpad was starting to die.

Unfortunately the negative side effect of this is that I had to put Glista on hold - since I didn't have a Gtk+ based desktop anymore there wasn't much point in actively working on it.

However, in the last couple of days (following some patches that came in from ananasik, for whom I immediately gave commit access) my fingers started itching, and I decided to play with porting Glista to OS X - and found this project.

After some hours of tinkering, crashing, building, rebuilding and breaking things again, I now have a somewhat working (albeit ugly, and not so OS X friendly) working Glista.app Application bundle running on my own 32 bit OS X 10.6:

Glista running on native OS X for the first time!

Glista running on native OS X for the first time!


Glista in the Dock!

Glista in the Dock!

If you're really up for it, you can get a Disk Image here.

You can also build it from source by checking out http://glista.googlecode.com/svn/branches/osx-support and doing the following:

  1. Make sure you have all the nescesary build environment (XCode is usually a good start!)
  2. Install all the gtk-osx tools and libraries including ige-mac-builder and gtk-quartz
  3. cd into the source directory and run (in a jhbuild shell after installing osx-gtk) ./configure --prefix=$PREFIX
  4. Note that some things do not work on OS X yet (or will never work) like libunique integration, gtk-spell, libnotify integration etc. - that's normal for now
  5. Run 'make', don't (!!) run 'make install' (well you can, but there's no need, you'll just pollute your system
  6. cd into dist/mac/ and run 'make dist-mac'. If everything is ok this should create Glista.app in that directory.
  7. Move that .app into /Applications (or anywhere else) and enjoy!

So far, it looks like it's going to be a long time before Glista will work smoothly on Mac - and most of it is because Gtk+ is not really that portable, and making it use OS-native widgets and rendering seems to be quite a challenge. I also don't feel I know enough about the internals of Gtk+, Quartz or OS X in general in order to help with that effort - but who knows, maybe I'll be able to help somehow?

BTW I'm not sure if that binary will work on anything but OS X 10.6 on Intel 32 bit. If you try, let me know!

Imagick: Maintain (fake) transparency when saving as JPEG

I haven't blogged in a while (have been busy you know), so I've decided to share this small piece of knowledge I've obtained by experimenting. I wrote a small test app (it's for a feature of the next version of Zend Server - maybe I'll share it one day when the API is stable), which does some image manipulation with the ImageMagick extension.

For those of you who don't know ImageMagick allows one to preform pretty cool stuff on images - except for the usual drawing, conversion, rotation, rescaling etc., it also exposes some API to easily preform neat effects, like drop shadow, round corners and my newest favorite (apparently only available in the very latest builds of the extension) - the Polaroid effect.

In his blog Mikko Koppanen, the author of the ImageMagick PHP extension, shows how to create drop shadows (as well as other neat things - you should check out his blog!), but in his examples Mikko will always save as PNG, which is something I dare to say most web users will not do, and prefer saving as JPEG.

Problem with many of those effects, is that they leave parts of the image as transparent. When saving the picture as JPEG (as I do, since saving as PNG produces too big files), these transparent areas appear as black.

So after some experimenting, I've found out that the way to work around this is to composite another opaque layer as your background layer, filled with your background color of choice (white in my case). You will of course loose the ability to place the picture on other background colors and still have a nice "transparency" look - but as long as you stick to the background color you've set, it will look great.

Here is a code sample producing the same thumbnail + drop shadow as in Mikko's example, but saving it with white matte color as JPEG:

PHP:
  1. <?php
  2.  
  3. $bgColor = '#ffffff'; // End result will have a white background
  4.  
  5. /* This was taken from Mikko's example */
  6. $im = new Imagick( 'strawberry.png' );
  7. $im->thumbnailImage( 200, null );
  8. $im->roundCorners( 5, 5 );
  9.  
  10. $shadow = $im->clone();
  11. $shadow->setImageBackgroundColor( new ImagickPixel( 'black' ) );
  12. $shadow->shadowImage( 80, 3, 5, 5 );
  13. $shadow->compositeImage( $im, Imagick::COMPOSITE_OVER, 0, 0 );
  14.  
  15. /* My addition: clone the entire image again to create the background layer */
  16. $bg = $shadow->clone();
  17.  
  18. /* I'm using colorFloodFiilImage with high tolerance to paint it all white - maybe there are 'cleaner' ways to do it though */
  19. $bg->colorFloodFillImage($bgColor, 100, '#777777', 0, 0);
  20. $bg->compositeImage($shadow, Imagick::COMPOSITE_OVER, 0, 0);
  21. $bg->setImageFormat('jpeg');
  22. $bg->flattenImages();
  23.  
  24. /* Display the image */
  25. header( "Content-Type: image/jpeg" );
  26. echo $bg;

While there's another step in the way, and the image will only look good on white backgrounds, you can now save it as a JPEG file with good compression and acceptable file size.

How much is listening to your customers worth?

I normally don't write about work. The reason is that I feel that the slight chance that someone might feel I'm being biased towards a product that comes from the company I work for and dismiss my thoughts as "guerilla marketing" is not worth it.

However, I'm going to make an exception - and that's because I prefer selling Zend here rather than doing it on Lukas Smith's blog :)

Lukas raises the question of what commercial PHP distribution should be used as an alternative to RHEL outdated packages. My answer on that would be, surprisingly - use Zend Server! (well, ...once it's out of beta, of course).

Lets put the features and SLA you get from Zend Server aside for a moment.

The real reason I think you should use Zend Server is because the Zend Server product manager (hey, that's me!) reads your blog. I'm serious about this.

I'm not sure I can quantify this, but I think that a vendor that listens so closely to what potential users (and the community) has to say is worth quite a lot in the long run. And yes, Zend has not been perfect in listening to the community - but I can honestly and whole-heartedly say that we are trying harder. The recent feedback on Zend Server gives me the feeling that we are doing ok too.

NetworkManager: Auto-HTTP login to a Wifi network

One of the cafés in my area where I frequently drink / work requires you to pass through an annoying web page forcing you to agree to some terms before allowing you to access the Internet through their Wifi network. It's free - but they still annoy you with this silly HTTP gateway. This is actually a frequent thing in Israel - most cafés offer free Wifi access, but some will require to to log-in nevertheless.

So today I figured out how to get NetworkManager to automatically work around this HTTP gateway for me whenever I connect to Arcaffe's Wifi network. Since it's super cool, and since I bet lots of people are annoyed by these sort of Wifi gateways, here's how to do it:

Apparently, NetworkManager allows you to create special post-connect or post-disconnect scripts that are executed when a network interface is brought up or down. Here is what I did:

I Created the following script and saved it at /etc/NetworkManager/dispatcher.d/100httpgateway.sh:

CODE:
  1. #!/bin/sh
  2.  
  3. IFACE="eth1"
  4.  
  5. if [ "x$1" = "x$IFACE" ] && [ "$2" = "up" ]; then
  6.     # Figure out the wifi SSID
  7.     SSID=$(/sbin/iwconfig $IFACE  | grep ESSID | cut -d: -f2 | sed 's/^\s*"\(.*\)"\s*$/\1/')
  8.  
  9.     case "$SSID" in
  10.         "012-ArCaffe")
  11.            URL='http://captive.012.net.il/user/refresh/home?confirmed=true&submitButton=+OK+&CPURL=http%3A%2F%2Fwww.arcaffe.co.il%2F&t=fsm3j5oe'
  12.            DATA='x=9&y=5&agree=on&username=arcaffe&password=arcaffe012'
  13.            COOKIE='Cookie: JSESSIONID=uc54121j305s; cookies=true'
  14.            REFERER='Referer: http://captive.012.net.il/home?confirmed=true&submitButton=+OK+&CPURL=http%3A%2F%2Fwww.arcaffe.co.il%2F&t=fsm3j5oe'
  15.  
  16.            curl -d "$DATA" -H "$COOKIE" -H "$REFERER" "$URL"> /tmp/arcaffe.last 2> /tmp/arcaffe.last.err
  17.         ;;
  18.     esac
  19. fi

Don't forget to make the file executable - I did it by running chmox +x /etc/NetworkManager/dispatcher.d/100httpgateway.sh.

Some things you should note:

  • "012-ArCaffe" is the ESSID of the network I'm logging in to. This of course work for ArCaffe in Israel, but you should change that with your network's ESSID.
  • Replace the value of IFACE with the name of your wireless interface
  • $1, the first parametter passed by NetworkManager to the script, is the network interface that was just connected or disconnected
  • $2, the second parameter, is "up" or "down" - the status of the interface.
  • The code I have inside the case block is where the magic happens. In this case, I send an HTTP POST with the correct parameters, Cookie and Referer headers and URL. This causes ArCaffe's gateway to log me in
  • I use curl - but I could have also used wget or any other tool to do the job
  • The -d flag sends the POST data, the -H flags set a header
  • I figured exactly what request to send using LiveHttpHeaders - but you can also use tcpflow or any other packet sniffing or HTTP sniffing tool
  • You can add more options to the 'case' statement for more networks that need that sort of treatment. With a little of bash-fu that should be no problem.

That's it! Man I love Linux today :D

Debugging CLI PHP with Zend Server and PDT on Linux and Mac

I'm working on a small PHP application and a big part of it are some CLI scripts which will be executed in the background. Some of these scripts are quite complex, and I got to a point where I need to use a debugger in order to figure out what's going on.

I started hacking around with my locally-installed Zend Server CE and Zend Studio. I always knew how to manually start CLI debug sessions with Zend Studio (well, I knew, but forgot ;-) ), but then I figured, why not write a small shell script to automate the process, and learn a little about the Zend Debugger protocol on the way?

Here is what I did:

First, create the following shell script. I placed it at /usr/local/zend/bin/php-dbg (alongside the other Zend Server executables, which if you use Mac OS X will be at /Applications/ZendServer/bin):

CODE:
  1. #!/bin/sh
  2.  
  3. # Wrapper script for debugging PHP CLI scripts with Zend Studio
  4. # Tested with Zend Server 4.0.0 Beta and Zend Studio for Eclipse 6.1.1
  5. # Shahar Evron [shahar.e at zend], 2009-02-20
  6.  
  7. # Defaults
  8. DFLT_PORT="10137"
  9. DFLT_HOST="127.0.0.1"
  10. DFLT_PARAMS="debug_fastfile=1&use_tunneling=0"
  11.  
  12. # Load Zend Server environment variables
  13. . /etc/zce.rc
  14.  
  15. # Did the user specify the debug host / port?
  16. if test "x$DEBUG_HOST" != "x"; then
  17.   if test "x$DEBUG_PORT" != "x"; then
  18.     QUERY_STRING="&debug_port=$DEBUG_PORT"
  19.   else
  20.     QUERY_STRING="&debug_port=$DFLT_PORT"
  21.   fi
  22.  
  23.   QUERY_STRING="$QUERY_STRING&debug_host=$DEBUG_HOST&$DFLT_PARAMS"
  24.  
  25. # If no host/port were specified, try to auto-detect
  26. else
  27.   QUERY_STRING=`wget http://localhost:20080/ -O - 2> /dev/null`
  28.   if test $? -ne 0; then
  29.     # Fall back to defaults
  30.     echo "Unable to auto-detect Zend Studio settings, using defaults">&2
  31.     QUERY_STRING="&debug_port=$DFLT_PORT&debug_host=$DFLT_HOST&$DFLT_PARAMS"
  32.   fi
  33. fi
  34.  
  35. DBG_SESS_ID=`date +%s`
  36. QUERY_STRING="start_debug=1&debug_stop=1$QUERY_STRING&debug_session_id=$DBG_SESS_ID"
  37.  
  38. QUERY_STRING=$QUERY_STRING $ZCE_PREFIX/bin/php -c $ZCE_PREFIX/etc/php.ini $@

Going over this code might teach you some surprising things about how Zend Debugger and Zend Studio talk to each other ;) I'm not going to go into the details now, but if you have questions feel free to ask.

Next, make this script executable - just run 'chmod +x <path-to-script>' - and you're good to go.

Here is how to use it:

  • If you have PDT or Zend Studio running locally (on the same machine as the server), just run:

    # /usr/local/zend/bin/php-dbg <script you want to debug>

    That would just work in most cases - if it works you can stop reading now ;-)
  • If you are running the script on a server, but your PDT / Zend Studio is on a different machine (in the same LAN - no NAT or firewall!) you can simply specify the IP address or host name of the machine that runs PDT / Zend Studio as the DEBUG_HOST environment variable. For example:

    # DEBUG_HOST=10.1.2.3 /usr/local/zend/bin/php-dbg <script you want to debug>
  • If you are running the script on a remote machine (as above) and your Zend Studio listens on a port other than 10137, you can also pass the DEBUG_PORT environment variable to override the default port.
  • Also, don't forget to make sure that the machine that runs your Zend Studio is in the list of allowed debugging clients. You can check it at the Zend Server GUI on Server Setup -> Debugger.
  • If you are running the script on a remote host and there's a firewall / NAT between you and the server (e.g. you are in an office LAN, trying to debug a script on a remote production machine which is not in your subnet) you'll probably need to use SSH remote port forwarding to forward connections to your PDT / Zend Studio. I won't get into how to do it right here - unless you insist.
  • If you want to only type 'php-dbg' when running instead of the full path, you can place the file in your $PATH (e.g. in /usr/local/bin) or even better, Add /usr/local/zend/bin (or /Applications/ZendServer/bin) to your $PATH - you can do that by adding the following line to ~/.bashrc:

    PATH=$PATH:/usr/local/zend/bin

Upon running the script, a debug session should simply pop-up in your PDT / Studio and you'll be able to debug. How cool is that?

BTW: This has been tested with Zend Server 4.0.0 beta1 and Zend Studio 6.1.1. It should work with other versions of Studio as well. In fact, it can also work without Zend Server as long as you have Zend Debugger installed - but why ruin a perfectly good plug?

If you improve the script or find bugs, let me know! Also, if you know how to get the same thing going with xDebug, let me know and I'll add it to the script.

Finally, it’s out: Zend Server

I normally try not to write about work related stuff... but this is a special occasion.

Zend Server is finally out for public beta. \o/

I was working so hard on this for the last year, It kind of feels like I've just crapped an Elephpant ;)

Seriously now, I really like this product. I think it has great potential. I know a bunch of very good people who worked very hard on it, and deserve every bit of gratitude. We went over some rough times at Zend and we still were able to release this wonderful product! I'm so proud... :)

Priceless: “The Issuer Certificate Is Unknown”

Firefox: "mossad.gov.il uses an invalid security certificate"

Another example of the all-so-frightening invalid HTTPS certificate warning in Firefox 3.0. I just found this one to be a bit ironic :)

BTW The Mossad website is mostly for recruiting purposes, they don't really let you search their archives on-line or anything... to bad, that could have been interesting :P

(and one more thing: yes, it's "The Mossad" and not just "Mossad" as it's frequently mis-translated in foreign media. "Mossad" literally means "Institute" or maybe in a less literal translation, "Agency". There are many institutes and agencies, but there is only one "The Institute")

Subversion: Finding the “base” revision of a branch

I use Subversion a lot - but today I've learned something new:

You can easily find the "base" revision of a branch or a tag (i.e. the revision in which the branch or tag was created) by issuing the following command:

CODE:
  1. svn log -v --stop-on-copy \
  2.     http://glista.googlecode.com/svn/branches/feature-reminders

The last revision you see in the log (in this case from one of my own Glista project's branches) is the revision the svn copy command was issued on, i.e. the branch was made.

This can then be used when merging the same branch back into trunk.

Neat!