Dylan Tweney
Rough Drafts

Googling blogs: A proposal

As much as Google rocks,there is one area where it really sucks: Searching weblogs. That’s because it’s not particularly intelligent about separating or summarizing weblog entries, making it a pretty blunt tool for finding specific information on a blog.
Dylan Tweney 2 min read

As much as Google rocks, there is one area where it really sucks: Searching weblogs. That’s because it’s not particularly intelligent about separating or summarizing weblog entries, making it a pretty blunt tool for finding specific information on a blog.

The problem: Google looks at the web in terms of pages. That’s a problem for weblogs because oftentimes the index page may hold many posts — and the same posts are then repeated on archive pages. Sometimes it’s one post to a page, sometimes a whole month’s worth. There’s no standardization. When you do a Google search on someone’s weblog, the results are a mishmash of single pages and agonizingly long archive or index pages. As a result, you’re forced to repeat the same query using your browser’s Find feature (Ctrl-F in Internet Explorer) in order to zero in on the exact spot where the search term appears. Repeat Ctrl-F until you’ve exhausted the current page, then go back to Google and look at the next search result.

Never mind that Google’s index is often several weeks out of date while weblogs get updated daily, if not more often.

On top of that, Google’s default summaries aren’t very good at capturing the essence of a post. Since it doesn’t know the difference between a post and a page, that’s not surprising.

The upshot: As I’ve argued before, weblogs, in their current form, are great for recording information but really suck at information retrieval.

One possible solution: As it turns out, we do have a couple of data formats that understand the difference between a post and a page, include useful summary data, and even include handy pointers back to the exact archive location of a post. They’re called RSS and RDF.

These syndication formats are used to aggregate news, but they could be useful indexing tools too. What if Google (or Daypop, once they can afford to buy a few new hard drives) collected RSS and RDF feeds — and then archived them in a searchable index?

Instead of news stories scrolling off into oblivion when they get to the bottom of a feed, they’d enter a permanent index where they could be used for information retrieval later.

The benefits: A search engine could let you do searches against the archived feeds — and could display the article summaries that are included in the feeds themselves, guaranteeing that these summaries would be appropriate and relevant.

You could display all matching results, along with their summaries (or full text, where available), on a single page, making it much easier to scan the results. And, you’d have the links back to the archival versions, where you could see each post in its full, formatted glory.

You could add search criteria such as date of publication, letting you retrieve all matching posts from a specific year or month.

You could search a single blog, you could search several specific blogs at once, or you could search all indexed blogs — and in each case, all matching results would appear on a single page.

Is anyone doing this now? I’d love to hear about it.

Share
Comments
More from Dylan Tweney
Rough Drafts

Fifth Sun

Fifth Sun: A New History of the Aztecs by Camilla Townsend Camilla Townsend has pulled off a remarkable magic trick in this book, reconstituting the Mexica empire with an amazing level of detail and sensitivity. It makes the Aztecs feel like a real people, with a vibrant and complex culture, instead
Dylan Tweney 1 min read
Rough Drafts

The tree with the lights in it

Pilgrim at Tinker Creek by Annie Dillard My rating: 5 of 5 stars It took many months for me to finish reading this book, as I could only manage it in small doses. Dillard writes with an intensity level that starts around 7 or 8 and cranks up to 11 by the end of each […]
Dylan Tweney 1 min read

Storylines

Subscribe to my newsletter on writing & storytelling

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Dylan Tweney.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.