A Better WordPress Search

The search feature of WordPress does not deliver good results. The good news is that I know how to fix it. First, let’s talk about the current WordPress search.

As an example I will use the phrase “I will buy your idea”. I am going to search using the WordPress search tool, Google, and a replicable SQL query that could be turned into a WordPress plugin (or rolled into core).

A year and a half ago I wrote a post titled “I will buy your idea“. Obviously that should be the very first result. Hopefully everything after that is relevant.

Using the default WordPress search I get the following results:

  1. linkblog: Nov 15
  2. My Ten Holiday Season Flying Tips
  3. Database Tools for Non-Profit Organizations
  4. No Million Dollar Ideas
  5. Web 3.0 Coming Soon to a TV Near You
  6. DVDs With Lots of Dots

As you can see, the WordPress search failed miserably. The post I am looking for is not there and most of the posts are not even relevant. I think the “No Million Dollar Ideas” post was relevant. Other than that, posts that contained one of those words in reverse chronological order (obviously “I” was stripped from the phrase).

Using Google search I get the following results:

  1. I will buy your idea
  2. My Ten Holiday Season Flying Tips
  3. [jaxn.org home page]
  4. Holiday Cell Phone Shopping Tips
  5. Hyper Local Tagging
  6. Personal Attention Tracking

Obviously better. The post I was looking for was the first result. The other results we pages that contained the entire phrase because GoogleBot has come by when someone commented on the original post and so it was listed in the “Recent Comments” section of my sidebar. So, Google is better, but far from perfect.

Using a query against the database to represent my way of doing search I get the following results:

  1. I Will Buy Your Idea
  2. No Million Dollar Ideas
  3. Ship It! Go Buy It!
  4. WKRN Promises to Buy Me a Drink
  5. Taxing Food is Not a Progressive Idea
  6. How To Buy a Bottle of Wine as a Gift

That is obviously the best result set of the three. Every item is relevant and #1 and #2 are probably the most relevant posts I have ever written for that search term.

So how does it work? The basic idea is to give each row in the wp_posts table a score for the search query. Since we could never anticipate every search term we have to do this in real-time. The basic gist is that if the title matches exactly it gets a lot of points. If the title contains the full search term then it gets a little less. If the post content contains the full search term then it gets a little less.

Then we start looking at individual words. First I strip out the “Stop Words” . (Stop Words are words search engines usually ignore. In this case we ignore I, will, and your.) Then I give a little more points for words in the title than words in the body.

Sum up all the points and you have the score. Sort the results by score from highest to lowest and you have the most relevant search results. Can it be better? Absolutely. Currently it does not account for multiple occurrences of words. It also does not account for tag matches (which would score pretty high). Still, it is the most relevant search results. (I am sure this sounds familiar to several people I have worked with before.)

But is it fast? Absolutely. Sure, the SQL statement is long and it contains fulltext search, but there are no joins and nothing is too complex. I don’t have the biggest blog out there, but there are 2,510 rows in my wp_posts table. How fast was the query? 0.07 seconds.

Maybe I will work on this as a plugin on the plane today.

Update: I am going to work on statzen instead of this plugin. Maybe one day…

3 Comments Short URL , , ,

Movable Type Open Source: Who Cares?

It is official, Movable Type is open source. You can get all of the details here.

Richard MacManus posted this on ReadWriteWeb:

Importantly, it also means that Six Apart has finally removed the one major advantage that WordPress has had over Movable Type – that it is open source.

Open sourcing MovableType doesn’t even come close to removing the major advantage WordPress has over MovableType. You could always get MovableType for free, and aside from plugins and themes very few WordPress users actually do anything with the source.

MovableType is an archaic architecture and is too difficult to get up and running. WordPress may not be the prettiest code, but it sure is easy to install.

0 Comments Short URL , , ,

Short URLs

I am playing with moving jaxn.org to shorter permalinks. A long time ago I read that it is a bad idea to include dates in a permalink because it makes older posts look stale even if the content is relevant. I have also been seeing some people talking about the importance of short URLs since they are more portable (and are more likely to fit in twitter).

Unfortunately, changing the permalink structure in WordPress seems to break all existing URLs. I thought I had read something about the new release of WordPress handling changing permalink structures gracefully.

0 Comments Short URL , ,

I Stand Corrected

The other day I lamented the lack of tag capability in XMLRPC for WordPress 2.3. Mapping tags to mt_keywords in WordPress just made sense and it was a shame that the WP development team didn’t think of it.

The cool thing about it is that I was wrong. WordPress 2.3 has exactly the behavior that I wanted. I had just tested it before they added it. The final version of WP 2.3 allows me to attach tags to a post from MarEdit using the Keywords field. Sweetness.

If you don’t see the keywords field in MarsEdit, click View -> Keywords Field. If you see tags on this post then it works.

0 Comments Short URL , , ,

Not-so-extreme Makeover: blog edition

I have upgraded to WordPress 2.3, though I had been running alpha, beta, and release candidates for a while). Now that the final release is out I decided to move to a new theme. I had heard that the new WordPress 2.3 default theme was going to be based on Sandbox. Unfortunately the timing didn’t workout and Kubrik is the default theme until 2.4. That didn’t stop me though.

There was a Sandbox Theme Competition that I had meant to enter but never got around to it. The winning theme (Sandpress) is pretty kick ass, but I wanted to try and keep things looking fairly consistent. Initially I was going to just make Sandbox look more like my old theme, but I decided to start from Sandpress instead.

Currently I am using a slightly modified version of Sandpress, but over the weeks and years it will become more and more customized (i.e. worse than the original). Thanks Aprit for the design. It is a step in the right direction for me.

Update: I have to say, I really like how jaxn.org looks now. Couple that with tags working over XMLRPC and I am pretty motivated to post. That is probably evidenced by the flood of posts yesterday.

1 Comments Short URL , , , ,

WordPress 2.3 Tackles PolyURLism

Kudos to the WordPress team for tackling some of the problems of polyURLism.

From the release announcement:

“WWW or no-WWW? Based on your Blog Address, WordPress automattically redirects the other to your blog address. Partial post URLs should find and redirect to the full URL. Also, if you change the Post Slug, the old URL will redirect to the new one”

This evening I will do some number crunching on the statzen database to see how this is making a difference on jaxn.org.

0 Comments Short URL , ,