Solve a CAPTCHA, help the world

I’ve just stumbled upon a great idea named “reCAPTCHA”. The idea is that you use their CAPTCHA challenges to protect your site from spammers and bots, or to hide people’s email addresses - but the cherry on top is that when people solve CAPTCHAs on your site, they also help digitize books by filling in words that OCR could not process.

According to the creators (from the site: “reCAPTCHA is a project of the School of Computer Science at Carnegie Mellon University “):

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.

So they take the words OCR programs fail on (meaning: those are hard CAPTCHAs for bots to solve in the first place) and ask humans to solve them - and when they do, they use the words to fill in the OCR blanks. Now how cool is that? I’m definitely going to look deeper into this when I have a few free moments.

They even have API libraries in several languages, including PHP.

What I have always thought is now official

Ever tried looking up your name in Wikipedia? Check this out.
I knew it - and the title says it all.

The M in MVC

In the last few weeks I’ve been thinking about models allot. Models, as in Model-View-Controller modules, are the most abstract and hard to frame-out part of this holy trinity (one might say Models are the “Holy Spirit” of MVC). So what’s the best practice here, if there is even one?

Models represent data and provide the means to preform data-specific actions. In a sense this is exactly what objects are (as in OOP objects) - so one might say that simple container objects are in a sense what the model should be. But in the real world, you almost always need data persistence, which means having some kind of storage mechanism to read from or write to. In the PHP world that’s almost always a database (and almost always MySQL) but it doesn’t have to be: Models could be based on RSS feeds, XML or other serialized data formats, web services, and more. In fact, $_SESSION could (and should?) be a part of your model. So data-layer abstraction could be a nice thing!

It gets even further - think about a model class which uses several data layers, mixing RSS feeds with locally stored meta-data for example; Or a Transaction class, that when saved, will both save local information in the DB and send an API call to PayPal, executing the transaction or fetching information from PayPal’s logs.

Well - getting back to my point, I was thinking about the best way to “frame-out” models and I have to say I didn’t come up with a good, practical solution. One of the conclusions I did made however, is that we are thinking completely wrong when designing our models. We (or at least I) have a tendency to design the model layer by starting from the storage and data access layers (the DB schema usually) and then going to the application layer. If you ever used Propel, it does exactly the same (as all ORM attempts probably do) - you design a database schema, and then build your model classes around it. Then, whenever your application requires some complex relations or data access, it becomes hell and you need to hack things together to make it work (try things like efficient batch updates, or efficient JOINs).

This is obviously wrong. We should be building or at least designing our application first - designing the logic, planning what sort data and what data related actions each action will need to perform, and then design model classes accordingly. These classes in turn will be used to design the data storage layer. The end result will probably be a set of classes for each DB table (or pseudo JOINed tables for that matter) but in run time, will be much more efficient and easy to use.

I am not sure how this can be done and even if this is possible - consider it to be a theoretical idea. But think of the possibilities when you start from the application layer and go down - instead of being limited by your data access layer.

You’re more than welcome to share any thoughts (or objections).

Premature Optimization and The Web

I had an interesting conversation with Nir Yariv a couple of weeks ago in which we got to talk the meaning of the well-known (and probably my favorite) software development moto: “Premature optimization is the root of all evil”, coined by C. A. R. Hoare, and the special meaning it has in modern web development environments.

In my opinion, “Premature Optimization” is the very common practice of spending too much time optimizing something before we even know it needs optimization. Many developers, including myself, tend to seriously try to optimize things even when it’s not cost-effective or productive, just because we want to do things elegantly. But when exactly is optimization not premature? And what does it mean specifically in web development? That’s an interesting question, and a sort of question us techies tend to neglect too often.

I meet with lots of professional web development teams, in companies ranging from the smallest shops and startups to the largest web enterprises world wide. Almost all of them have something in common: They have no time. They all work in a market-driven (and too often marketing-driven) environments: priorities change daily and new feature requests keep coming in with very little quiet periods. There is no time (and sometimes no will from the management) to fix bugs, and especially no time to optimize. Beyond that, most of the applications they develop go through a different life-cycle than traditional software release cycles: the application is built, released once, and then (perhaps due to lack of time / resources / bad release management) it is patched and patched and patched. The application never goes through a second “release” - so when should the optimization phase be performed?

I’ve been thinking about this allot, and I came up with an interesting conclusion: “Premature Optimization” in web applications mean focusing on performance instead of on maintainability, modularity, flexibility and generally knowing your code well. I’m not saying performance is not important: you should develop with performance in mind. But it should not come at the expense of maintainability and modularity - because those will allow you to both fix bugs and scale easily when the time comes.

These thoughts are not revolutionary, but I think they make an important note, especially today when code generation is so popular, PHP source code to C extension conversion pops up again, and claims that source code documentation is overrated are heard (here and here).

Having this conclusions in mind, I came up with a short (partial) list of tips for people who start writing a web application and want to know where to put their focus:

  • Write maintainable code - in other words, decide on coding standards and follow them. In many cases it also means go OOP - but not always.
  • Write modular code - Yes, OOP again, but I think that in modern web applications it comes down to one important concept: MVC. You don’t have to use a bloated MVC framework - but separate your tiers. It’s worth it.
  • Document - Being able to scale means being able to bring in new developers that will be able to jump into the water as soon as possible. The more in-line documentation you’ll have, the easier it will be for them to both understand how the application works and learn from other people’s experience.
  • Keep it Simple - If you’re going with a framework, make sure you know it well enough to hack it when it comes to it. Make sure you create (or use) a modular framework, that allows you to decouple tasks. Do things the UNIX way - small and simple tools that perform a single task and perform it well.
  • Be Careful with Code Generation - Again, just like with frameworks, ode generation is a double-edged sword. Too much of it - and you won’t know what your application does. If you use some auto-generator, make sure you study it’s output well.

I’m sure this list can be extended, but my main point is - don’t over do things. Do things in a way you’ll be able to easily improve them when you need to, and release on time. You’ll thank yourself when you’re slashdotted.

Unofficial Response to Zend Rants

While this is not an official Zend blog, and it is definitely not my role at Zend to do marketing (thank god), I couldn’t avoid commenting on this post at Tony Bibbs’s blog, even if just to clear out my own team members and colleagues from any speck of evil-doing-corporation guilt. [more...]

Always the Aggressor

Usually I don’t read political blogs. But I came across this post in Henri Begius’ blog (which is a very good blog), mainly because it got aggregated by Planet PHP for some reason. That led me to read this post in roozbeh’s blog - which was very interesting.

What I found interesting is that roozbeh, an Iranian open-source developer working on the FriBidi project (along with Arabs and Israelis AFAIK) does not seem to understand the Israeli mindset, and is worried from an Israeli attack on Iran as a retaliation on the results of the war in Lebanon the past summer. I didn’t find any way to leave a comment in roozbeh’s blog - so I will leave my comments here in hope to start up a real discussion.

Now I don’t know allot about Iran - but this is what I do know about Israeli - Iranian relations:

[more...]

Bloody Weather

Thanks to the miracle of Wifi, I can share the fact that I’m stuck in London Heathrow airport, waiting for my flight back home. It’s crazy as hell in here. It takes something like an hour just to get into the terminal, not to mention check-in and board.

My first leg to Frankfurt should have left an hour ago, and will probably delay at least an hour more.  This of course meens I will miss my connection to Tel-Aviv.

Anybody knows a good bench in the Frankfurt Main airport?

Marco on Holocaust Denial

I usually read Marco Tabini’s blog looking for PHP related stuff. Today however, I came across a very well written post by Marco on holocaust denial - apparently commenting on current events in Iran.

Being an Israeli and Jewish, this is a very touchy subject of course. But beyond that, being a person who sees studying history from all it’s directions as a possible cure for the ignorance of mankind, I find holocaust denial an interesting an important subject.

[more...]

Welcome to my blog!

I finally did it, and started my own blog.

I’ve been planning this for some time now, but never had the time until now. Actually, even now I don’t have the time - I just got used to sleeping less ;)

I plan to write from time to time about things that go on in my life - a large part of this blog will probably be the technical-php-and-open-source kind of blog - but I hope to do quite allot of more personal thoughts and experiences kind of blogging, assuming I will be brave enough and fluent enough.

Oh, and this blog is not meant to be a monolog - so if you have your own thoughts about something I write, please comment!

Cheers!