My customers, they come to me looking for nourishment, a late-night snack, or maybe some light reading. They want to be fed and they want it now, and I wake from my slumber to give it to them. They walk away satisfied.

My name is Mike. I am a feedmaster. This is my story.

Late last night I took Chateau Blogsville live and I’ve been adding to the filters throughout today in order to tune the output. Suspiciously, this is what life is like for the analysts working in our SOC. =)

Lessons from tuning feeds periodically during the day:

  • I have a sizeable set of explicit blocks for quite a few terms coming from the search feeds. Even though I could build the search feeds with “NOT” values, I still had a bunch of trash that was more effectively deleted by a global junk screen.
  • I developed an “allow” filter based on keywords in the content. This is what I call the “relevancy filter”. In Chardonnay, it’s used for the dirty gray and gray feeds. In Eiswein, it’s used for everything.
  • I’ve done more blacklisting for the search feeds (dirty gray feed) on urls than I have on keywords for the time being, making broad slashes through aol.com and myspace.com. Time will tell if this will be a fool’s game, since the spam blogs can come on pretty strong, and the only way to be sure is to nuke them from orbit.
  • I think I’ve pushed pipes beyond what it can do. About every third time, I get a null results set (ie, it times out). If you’re using a smart feedreader (I just make the feed a live bookmark in firefox), it just keeps the last version and you don’t really know or care that your feed is outdated, as long as it catches up sometime.
  • “Privacy” is the hardest thing to explicitly allow thanks to real estate, vacations, and dating. “Risk Management” comes in a strong second, thanks to banks, loans, and project management. Surprisingly, nobody but security people talk about BS7799.
  • I’ve roped in some really, really surprising content through the blog searches on technorati and google. What this means is that I’ll find sites like The Technology Liberation Front which I’m now a fan of. With as much of a hassle the search feeds are to filter out the junk, I think they definitely add something that a closed or by-invitation-only blog feed is missing. I’ll most likely add more feeds like this as I think them up.
  • Some of you will notice that at no point have I blacklisted the C-word (c*mpliance) but notice how it chokes itself to death nicely when you deny all but allow “risk management” and “penetration testing”?
  • There are a couple of terms that I deliberately did not add to the relevancy filter. Dollar for the person who names one, and the C-word doesn’t count.

Chateau Blogsville is now officially open. I will replace the RSS icons with something better once my graphic designer gets them done.

4 Responses

  1.  Marcin Says:

    Hmm… let’s see,

    Schneier (lol. that should probably be blocked :P)
    export control
    incident response
    information assurance

  2.  dre Says:

    technorati and bloglines got nothing on google alerts. i might as well stop searching google and reading blogs completely

    oh btw i wish these were in opml format so that i could do a global search and replace to add support of rss-cache


  3.  rybolov Says:

    Pipes doesn’t do OPML, probably because of all the data munging that you can do, it’s not a straight feed aggregator. You can, however, get the fed in email alerts. Or you can take what I have built, clone it, build your own and subscribe to it however you want.

    security was too broad to add to the relevancy filter, but it’s implied everywhere elsewhere (google blogsearch for security +privacy -dating etc…)

    hack(s|er(s)|ing) was again, too broad. I don’t want to feed up yet another blog about RMS singing the free software song–“Join us now and share the software, you’ll be free, hackers, you’ll be FREEEeeeEEeeeEEEEEEE!”

    I think those are the only one on your list I intentionally left off, so there are more yet to be revealed.

    However, if you watch the feed, you might see some of these terms or similar added in the near future. =)

  4.  rybolov Says:

    OK, I’m getting tired of the null results from pipes, so I’m thinking about caching the good results via cron and wget and hosting them on chateaublogsville.

    More to come, merry feedsters. =)

