Sunday, January 17, 2010

Search Goodness

There's a new bane of my existence -- search. Yep . . . to "Google" . . . to find stuff based on random phrases that may or may not have anything to do with the thing I'm trying to find . . . to expect a document to appear at the top of the list even though I put in a word which appears nowhere in the document . . . to have the correct reference appear on one search engine, but not on yours . . . to have diverse audiences who look for the same thing using completely different references . . .

OK, by now you get the idea. From what I am reading on the Internet, I'm not the only one with this millstone hung around their necks. Let me tell you a little about what I have found. First, our site is essentially 25,000+ documents (PDF, Word, etc.) organized into about 1,000 web pages. We have a very diverse audience and are constantly involved in a number of high-priority projects spanning a great deal of search real estate. These documents change at the rate of as much as 50 per day every business day.

It's an interesting problem for a search engine. Here are some of the challenges we face:

  • The need to index documents shortly after they are published to the site (e.g. within an hour, not a day or week.
  • The documents posted are not created specifically for the Internet so the authors don't see tagging them for search as part of their job.
  • We have a remarkably small staff publishing documents to the site for the volume they handle and, by the way, they also do other things during their work day.
  • Documents come from a number of internal and external sources for publishing.
  • Documents are often referred to by different titles than their official titles, especially for things like regulatory filings.
  • Budgets are constrained (aren't everyone's!).

Sound like I'm grousing? Well, maybe just a little. Mostly, I'm trying to work things out in my head. The situation is one, I'm sure is faced by search professionals everywhere. It's typified by statements like: "Can't find anything on your site.", "Your search is unusable", "I couldn't find the document on your site; went to Google/Bing/. . . and it popped right up". It's also typified by management which thinks that any problem can be solved by technology. I'm not sure that's true.

So my first issue is trying to determine what people want to see -- "search goodness". The question is:

Given a search phrase, what do you expect to see?

Seems simple, doesn't it? It's not. Sure, if you sell products, have a promotion, concentrate on an event, it can be much more simple. But, if you're like us and exist in "random chaos" it's much more complex. The proliferation of search engines might give some clue to the complexity of the subject.

So what can be done about it? I'm starting by trying to ask "experts". By this I mean subject matter experts, but also those executives who pay the bills. When you put in "X", what do you expect to see? And, I'm hoping for a more definitive answer than, "the thing I'm looking for at the moment". It could be a daunting task. After all, there are whole businesses out there dedicated to getting your site ranked higher in the commercial search engines. It can't be all the simple.

My initial thoughts are that process will be the key to our eventual success. We need to get documents tagged correctly to assure that they can be retrieved. This will start with the creation of the document and carry through to the creation of the final published version -- in our case usually PDF -- and the placement on the page (which needs to be tagged correctly, also). Another part of this thought is that implementing process will be difficult. One of the major reasons is that content on our site is not produced specifically for the site and thus, authors don't view tagging it for search retrieval is not viewed as part of their job.

More later as the quest continues . . .

 
 

No comments:

Post a Comment