Massively Scaled Security Solutions for Massively Scaled IT

Posted October 16th, 2009 by rybolov

My presentation slides from Sector 2009.  This was a really fun conference, the Ontario people are really, really nice.

Presentation Abstract:

The US Federal Government is the world’s largest consumer of IT products and, by extension, one of the largest consumers of IT security products and services. This talk covers some of the problems with security on such a massive scale; how and why some technical, operational, and managerial solutions are working or not working; and how these lessons can be applied to smaller-scale security environments.

Posted in FISMA, NIST, Public Policy, Speaking, The Guerilla CISO, What Works | No Comments »
Tags:

I’m on the OWASP Podcast

Posted October 1st, 2009 by rybolov

I sat down with Jim Manico a month or so ago when he was in DC and recorded a podcast for the OWASP Podcast.  It’s now live, check it out.

Posted in FISMA, NIST, Public Policy, Rants, Speaking, The Guerilla CISO | No Comments »
Tags:

Where is Rybolov?

Posted September 21st, 2009 by rybolov

Been busy lately.  This is a quick rundown on where I’ll be over the next couple of months so you can stalk me.

  • October 5-7: SecTor, Toronto, ON, Canada.  I’ll be talking about “Massively Scaled Security Solutions for Massively Scaled IT” which an allusion to the size of the US Federal Government IT budget and techniques that they use to manage it.  The Rybolov Layered Information Security Management Model seen here earlier weighs heavily into the presentation, as does a ton of other ideas trying to get people to understand that hazy information security management area above the enterprise.
  • November 6-7: DojoCon, Laurel, MD.  I’ll be talking about the “Current State of Compliance” which somewhere along the lines has a punchline of “It’s going to happen anyway, might as well drive the bus instead of being under the bus”.  There is also a compliance panel following my talk and I’ll be on it with Cyberhiker and Dan Philpott.
  • November 10-14: AppSec DC, Washington, DC.  I’ll be running amok making part of the conference work.  I’m not speaking at this one which is a good thing because, well, everytime I start talking web apps and security it takes me back to all the bad code I wrote in the late 90’s.  But hey, didn’t we all?

So in between preparing slides, running amok as a volunteer, and the usual work-life imbalance, I haven’t had much free time lately to add to the blog.  Plenty of ideas and blog fodder are floating around inside my head.  After the conventions I’ll put up my materials for the rest of the world to pick on.

Posted in Speaking, The Guerilla CISO | 5 Comments »
Tags:

Federal CIO Council’s Guidelines on Security and Social Media

Posted September 17th, 2009 by rybolov

I got an email today from the author who said that it’s now officially on the street: Guidelines for Secure Use of Social Media by Federal Departments and Agencies, v1.0.  I’m listed as a reviewer/contributor, which means that maybe I have some good ideas from time to time or that I know some people who know people.  =)

Abstract: The use of social media for federal services and interactions is growing tremendously, supported by initiatives from the administration, directives from government leaders, and demands from the public. This situation presents both opportunity and risk. Guidelines and recommendations for using social media technologies in a manner that minimizes the risk are analyzed and presented in this document.

This document is intended as guidance for any federal agency that uses social media services to collaborate and communicate among employees, partners, other federal agencies, and the public.

Posted in Odds-n-Sods, The Guerilla CISO | No Comments »
Tags:

Risk Management and Crazy People, a Script Using Stock Characters

Posted September 10th, 2009 by rybolov

Our BSOFH meets a Crazy Homeless Guy on the street just outside the Pentagon City metro station.

Crazy Homeless Guy: (walks up to BSOFH) Can I ask you a question?

BSOFH: (Somewhat startled, nobody really talks to him unless they’re trying to sell him something) Uhhhh, sure.

Crazy Homeless Guy: You know that there are people who claim to be able to say… take that truck over there and just by moving their finger make it fly into the Washington Monument.  Don’t you think that this is a threat to national security?

BSOFH: (Realizes that Crazy Homeless Guy is crazy and homeless) Not necessarily, you see.  I would definitely classify it as a threat.  However, when you’re looking at threats from people, you have to look at motives, opportunity, and motives.  Until you have all three, it’s more of an unrealized threat.

Crazy Homeless Guy: But what if these same guys could kill the President the same way, isn’t that a national threat?

BSOFH: Um, could be.  But then again, let’s look at a similar analogy:  firearm ownership.  Millions of people safely own weapons and yet there isn’t this huge upswell to shoot the President now is there?  Really, we have laws against shooting people and when somebody does that, we find them and put them in jail or *something*.  We don’t criminalize the threat, we criminalize the action.  Flicking a finger doesn’t kill people, psycho people kill people.

Crazy Homeless Guy: Or even if these same people could use the same amount of effort to kill everybody on the planet.  You know the <censored, I don’t like being sued by cults> people claim to have this ability.

BSOFH: (Jokingly, realizing that somebody has been taking 4chan too seriously) Well, I wouldn’t care too much because I would be… well, dead.  But yes, possibly.  But then again, since the dawn of the nuclear age and all through the Cold War we’ve had similar threats and people with capabilities created by technology instead of word study and the power of the human mind.  You have to look at these things from a risk standpoint.  While yes, these people have the capability to do something of high impact such as kill every human on the face of the earth, the track record of something like this happening is relatively small.  I mean, is there any historical record of a <censored, I don’t like being sued by cults> actually killing anybody through sheer force of their mind?  In other words, this is a very high impact, low probability event–something some people call a black swan event.  While yes, this is a matter of national security that these people potentially have this capability, we only have so many resources to protect things and we have our hands full dealing with risks that actually have occured in recent history.  In other words, risk management would say that this event you’re speaking of is an acceptable risk because of more pressing risks.

Crazy Homeless Guy: (Obviously beaten into oblivion by somebody crazier than himself) Well, I’ve never thought about it that way.  I’m really scared by these people.  Hold me, BSOFH.

BSOFH: Um, how about no?  You’re a Crazy Homeless Guy after all.  I have to get back to work now.  Come hang out sometime if you want to talk some quantitative risk analysis and we’ll start attaching dollar figures to the risks of <censored, I don’t like being sued by cults> killing all of humanity.  Doesn’t that sound like fun?  If we can get you cleared to get into the building, we can have a couple of whiteboarding sessions to determine the process flow and maybe an 800-30-stylie risk assessment just to present our case to the DHS Psychic Warfare Division.

Crazy Homeless Guy: Uh, I gotta find a better corner to stand on.  Maybe over by 16th and Pennsylvania I can find somebody more sympathetic to my cause.

BSOFH: You’re crazy, man!

Crazy Homeless Guy: You’re crazy, too, man!

And the moral of the story is that no matter how crazy you think you are, somebody else will always show up to prove you wrong.  And yeah, black swan events where we all die are dumb to prepare for because we’ll all be dead–near total fatalities only matter if you’re one of the survivors.

This story is dedicated to Alex H, David M, and some guy named Bayes.

OMG It’s a Psychic Black Swan photo by gnuckx cc0.

Posted in BSOFH, Risk Management, The Guerilla CISO | 5 Comments »
Tags:

Stress-Test Apache with Intent to Tune: BSOFH Tip for the Software Masochist

Posted August 28th, 2009 by rybolov

So I’ve been having some problems with my server for a month or so–periodically the number of apache servers would skyrocket and the box would get so overloaded (load of ~50 or so) that I couldn’t even run simple commands on it.  I would have to get into the hardware console and give the box a hard boot (a graceful reboot wouldn’t work).

Root cause is I’m a dork, but more about that later.

Anyway, I needed a way to troubleshoot and fix it.  The biggest problem I had was that the problem was very sporadic–sometime it would be 2 weeks between crashes, other times it would be 3 times in one day.  This is so begging for a stress-test really badly.  Looking on the Internet, I found a couple of articles about running a load-tester on apache and information on the tuning settings but not really much about a methodology (yeah yeah I work for a Big 4 firm, the word still makes me shudder even though it’s the right one to use here) to actually solve the problem of apache tuning.

So the “materials” I needed:

  • One server running apache.  Mine runs Apache2 under Debian Stable.  This is a little bit different from the average distro out there in that the process is apache2 and the command is apache2ctl where normally you would have httpd and httpdctl.  If you try this at home, you’ll need to use the latter commands.
  • An apache tuning guide or 3.  Here’s the simplest/most straightforward one I’ve seen.
  • A stress-tester.  Siege is awesome for this.
  • Some simple shell commands: htop (top works here too), ps, grep, and wc.

Now for the method to my madness…

I ssh into my server using three different sessions.  On one I run htop.  Htop is a version of top that gives you a colored output and supports multiple processors.  The output without stress-testing looks something like this:

(Click for a life-size image)

I keep one session free to edit files and do an emergency “killall apache2″ if things get out of hand (and they will really quickly, I had to pull the plug about 20 times throughout this process).   I run a simple command on another ssh session to get a count of how many apache threads I have running:

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
11

OK, so far so good.  I’ve got 11 threads running with no load and RAM usage of 190MB.  I needed the extra “grep start” because it removes the text editor I have open on apache2.conf and anything else I might be doing in the background.

I also killed apache, waited 10 seconds, and looked at the typical RAM use.  With no apache running, I use about 80MB just for the OS and everything else I’m running.  This means that I’m using 110MB of RAM for 11 apache threads, which means I’m using ~10MB of RAM for each apache thread.  Now that’s something important I can use.

I took my tuning settings in apache2.conf (httpd.conf for most distros) (Apache2 uses the prefork module which uses threads, read the tuning guide for more info) and set them at the defaults listed in the tuning guide.  They became something like the following:

<IfModule prefork.c>
  StartServers            8
  MinSpareServers         5
  MaxSpareServers        20
  MaxClients            150
  MaxRequestsPerChild  1000
</IfModule>

Notice how the MaxClients is set at 150?  This will prove to be my downfall later.  Turns out that my server is RAM-poor for as much processor as it has or Wordpress is a RAM hog (or both, which is the case =)  ).  I’ll eventually upgrade my server, but since it’s a cloud server from Mosso, I pay by the RAM and drive space.

After each edit of apache2.conf, you need to give apache a configuration test and reload:

server:~# apache2ctl configtest
Syntax OK                        <- If something else comes back, fix it!!
server:~# apache2ctl restart

I’m now ready to stress-test using the default setup.  This is the awesome part.  First, I need to simulate a load.  I make an url seedfile so that siege will bounce around between a handful of pages.  I make a file siege.urls.txt and put in a collection of urls so that it looks like the following:

http://www.guerilla-ciso.com/

http://www.guerilla-ciso.com/about

http://www.guerilla-ciso.com/contact

http://www.guerilla-ciso.com/papers-and-presentations

....<about 20 lines deleted here, you get the point>

http://www.guerilla-ciso.com/page/2

http://www.guerilla-ciso.com/page/3

http://www.guerilla-ciso.com/page/4

I’m sure there is an efficient and fun way to make this, like say, a text-only sitemap or sproxy which is made by the same guy who does siege, but since I only needed about 30 urls, I just cut-n-pasted them off the blog homepage.

I fire up siege and give my webserver a thorough drubbing, running 50 connections for 10 minutes and using my url seedfile.  BTW, I’m running siege on the webserver itself, so there isn’t anything in the way of network latency.  <enter sinister laugh of evil as I sadistically torture my apache and the underlying OS>

server:~# siege -c 50 -t 600s -f siege.urls.txt
** SIEGE 2.66
** Preparing 50 concurrent users for battle.    <-The guy writing siege has a wicked sense of humor.
The server is now under siege...                <-Man the ramparts, Apache, they're coming for you!
HTTP/1.1 200   1.08 secs:   16416 bytes ==> /
HTTP/1.1 200   1.07 secs:   16416 bytes ==> /
....<about 2 bazillion lines deleted here, you get the idea>
HTTP/1.1 200   4.66 secs:    8748 bytes ==> /about
HTTP/1.1 200   3.92 secs:    8748 bytes ==> /about
Lifting the server siege...      done.

Transactions:                  61 hits   <-No, this isn't actual, I abbreviated the siege output
Availability:              100.00 %      <-with a ctrl-c just to get some results so I didn't
Elapsed time:                6.70 secs   <-have to scroll through all that output from the real test.
Data transferred:            0.87 MB
Response time:                3.27 secs
Transaction rate:            9.10 trans/sec
Throughput:                0.13 MB/sec
Concurrency:               29.75
Successful transactions:          61
Failed transactions:               0
Longest transaction:            5.61
Shortest transaction:            1.07

Now I watch the output of htop.  Under stress, the output looks something like this:

(Click for a life-size image)

Hmm, looks like I have a ton of apache threads soaking up all my RAM.  What happens is that in about 30 seconds, the OS starts swapping and the swap use just keeps growing until the OS is unresponsive.  This is a very interesting cascade failure because writing to swap incurs a load which makes the OS write to swap more.  Maybe I need to limit either the amount of RAM used per apache or limit the maximum amount of threads that apache spawns.  The tuning guide tells us how…

There is one setting that is the most important in tuning apache, it’s MaxClients.  This is the maximum number of servers (with the worker module) or threads (prefork module).  Looking at my apache tuning guide, I get a wonderful formula: ($SizeOfTotalRAM – $SizeOfRAMForOS) / $RAMUsePerThread = MaxClients.  So in my case, (512 – 80) / 11 = 39.something.  Oops, this is a far cry from the 150 that comes as default.  I also know that the RAM/thread number I used was without any load on apache, so with a load on and generating dynamic content (aka Wordpress) , I’ll probably use ~15MB per thread.

One other trick that I can use:  Since I think that what’s killing me is the number of apache threads, I can run with a reduced amount of simultaneous connections and watch htop.  When htop shows that I’ve just started to write to swap, I can run my ps command to find out how many apache threads I have running.

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
28

Now this is about what I expected:  With 28 threads going, I tipped over into using swap.  Reversing my tuning formula, I get (28 threads x 15 MB/thread) +80 MB for OS = 500 MB used.  Hmm, this makes much sense to me, since the OS starts swapping when you use ~480MB of RAM.

So I go back to my prefork module tuning.

<IfModule mpm_prefork_module>
 StartServers          8
 MinSpareServers       5
 MaxSpareServers      10
 MaxClients           25
 MaxRequestsPerChild   2000
</IfModule>

I set MaxClients at 25 because 28 seems to be the tipping point, so that gives me a little bit of “wiggle room” in case something else happens at the same time I’m serving under a huge load.  I also tweaked some of the other settings slightly.

Then it’s time for another siege torture session.  I run the same command as above and watch the htop output.  With the tuning settings I have now, the server dips into swap about 120MB and survives the full 10 minutes.  I’m sure the performance is degraded somewhat by going into swap, but I’m happy with it for now because the server stays alive.  It wasn’t all that smooth, I had to do a little bit of trial and error first, starting with MaxClients 25 and working my way up to 35 under a reduce siege load (-c 25 -t 60s) to see what would happen, then increasing the load from siege (-c50 -t 600s) and ratcheting MaxClients back down to 25.

And as far as me being a dork… well, aside from the huge MaxClients setting (That’s the default, don’t blame me), I set MaxRequestsPerChild to 100 instead of 1000, meaning that every 100 http requests I was rolling over and making a new thread.  That would lead to cascade failure under a load. (duh!)

Posted in Technical, The Guerilla CISO, What Works | 4 Comments »

« Previous Entries


Visitor Geolocationing Widget: