The search feature of Wordpress does not deliver good results. The good news is that I know how to fix it. First, let’s talk about the current Wordpress search.
As an example I will use the phrase “I will buy your idea”. I am going to search using the Wordpress search tool, Google, and a replicable SQL query that could be turned into a Wordpress plugin (or rolled into core).
A year and a half ago I wrote a post titled “I will buy your idea“. Obviously that should be the very first result. Hopefully everything after that is relevant.
Using the default Wordpress search I get the following results:
- linkblog: Nov 15
- My Ten Holiday Season Flying Tips
- Database Tools for Non-Profit Organizations
- No Million Dollar Ideas
- Web 3.0 Coming Soon to a TV Near You
- DVDs With Lots of Dots
As you can see, the Wordpress search failed miserably. The post I am looking for is not there and most of the posts are not even relevant. I think the “No Million Dollar Ideas” post was relevant. Other than that, posts that contained one of those words in reverse chronological order (obviously “I” was stripped from the phrase).
Using Google search I get the following results:
- I will buy your idea
- My Ten Holiday Season Flying Tips
- [jaxn.org home page]
- Holiday Cell Phone Shopping Tips
- Hyper Local Tagging
- Personal Attention Tracking
Obviously better. The post I was looking for was the first result. The other results we pages that contained the entire phrase because GoogleBot has come by when someone commented on the original post and so it was listed in the “Recent Comments” section of my sidebar. So, Google is better, but far from perfect.
Using a query against the database to represent my way of doing search I get the following results:
- I Will Buy Your Idea
- No Million Dollar Ideas
- Ship It! Go Buy It!
- WKRN Promises to Buy Me a Drink
- Taxing Food is Not a Progressive Idea
- How To Buy a Bottle of Wine as a Gift
That is obviously the best result set of the three. Every item is relevant and #1 and #2 are probably the most relevant posts I have ever written for that search term.
So how does it work? The basic idea is to give each row in the wp_posts table a score for the search query. Since we could never anticipate every search term we have to do this in real-time. The basic gist is that if the title matches exactly it gets a lot of points. If the title contains the full search term then it gets a little less. If the post content contains the full search term then it gets a little less.
Then we start looking at individual words. First I strip out the “Stop Words” . (Stop Words are words search engines usually ignore. In this case we ignore I, will, and your.) Then I give a little more points for words in the title than words in the body.
Sum up all the points and you have the score. Sort the results by score from highest to lowest and you have the most relevant search results. Can it be better? Absolutely. Currently it does not account for multiple occurrences of words. It also does not account for tag matches (which would score pretty high). Still, it is the most relevant search results. (I am sure this sounds familiar to several people I have worked with before.)
But is it fast? Absolutely. Sure, the SQL statement is long and it contains fulltext search, but there are no joins and nothing is too complex. I don’t have the biggest blog out there, but there are 2,510 rows in my wp_posts table. How fast was the query? 0.07 seconds.
Maybe I will work on this as a plugin on the plane today.
Update: I am going to work on statzen instead of this plugin. Maybe one day…



2 Comments
I’ve come to the same conclusion that Wordpress’s search is junk. Any luck finding a suitable plugin, or any advice in modifying the default search code?
Any chance you’d post your sql code for the rest of us to steal, err, I mean look at