BSOFH: Memo for My Project Team

Posted January 7th, 2010 by

Dear Project Team

Effective immediately and due to recent events , you are forbidden to utter the following phrases:

Direct Connection. In our world, nothing connects directly.  I have many pieces of expensive kit between your webserver and the users out on the Internet.  They don’t connect directly at all, but when you use this phrase, we have to give the SOC Manager an adrenaline shot to get his heart restarted.  It’s a series of tubes with some valves in the way, get it?

What are Oracle CPUs. Look, one more time with this:  these are the quarterly patches that Oracle puts out.  No idea why they call them Critical Patch Updates except maybe because they’ve been reading their own “unbreakable” literature a bit too much.  I don’t care if you call them “Late to Supper” as long as you keep me happy by testing them in the lab as soon as they’re released.

System. Let’s just suffice it to say that in my world, a “System” is something different than what you call it.  Think 2 layers abstracted and larger than your idea.

Security Waiver. Please don’t ask the security staff directly about waivers.  They’ll only send you on a huge journey to circumnavigate a huge amount of paperwork.

Remote Access. Yep, we have it.  But look, you guys are database and applications geeks, leave the drawings to me because you keep drawing the Internet users inside of our network.

Missing. OK, so we have 200 laptops that we don’t know right now where they’re at.  But if we use the word “missing”, then I have to spin up the laptop SWAT team from US-CERT.  Henceforth and forever more throughout the world of IT, I am the person who can declare something as “missing”.  In the mean time, feel free to use the phrase “unaccounted for”.

Wireless, Bluetooth, WiFi. You need to know where I’m coming from on this one.  Whenever we have project meetings, there’s an auditor dialed into the phone call, just waiting for us to say any of these words.  Then they wake and pounce on us.  Mayhem ensues.

Financial Data. Yes, I understand you think of it as financial data but to me, your spreadsheet is a non-authoritative, non-source analytical tool for numbers that just happen to be derived from authoritative financial system sources.  When you claim that it’s financial data, you just made a ton of work in integrity controls that is just plain ludicrous.

Tons of Custom Code. When you talk to the user community, talk up your epic slaying of code dragons and the myriad pitfalls of doing so.  But when you talk to the security team, custom code implies that we need to do a ton of code review. The official phrase is “automation scripts to assist the users with their workflow” or “glue code to string together existing applications”.

Offshore Developers. I can barely get the security team to allow me to have developers at all, much less developers at a contractor site.  Yes, they might be people who happen to live not in the US who get paid to write code.  But when you talk to the auditor, we have a word for this stuff: COTS software.

Love you guys.  No, really, quit laughing.

–The BSOFH



Similar Posts:

Posted in BSOFH | 3 Comments »
Tags:

Stress-Test Apache with Intent to Tune: BSOFH Tip for the Software Masochist

Posted August 28th, 2009 by

So I’ve been having some problems with my server for a month or so–periodically the number of apache servers would skyrocket and the box would get so overloaded (load of ~50 or so) that I couldn’t even run simple commands on it.  I would have to get into the hardware console and give the box a hard boot (a graceful reboot wouldn’t work).

Root cause is I’m a dork, but more about that later.

Anyway, I needed a way to troubleshoot and fix it.  The biggest problem I had was that the problem was very sporadic–sometime it would be 2 weeks between crashes, other times it would be 3 times in one day.  This is so begging for a stress-test really badly.  Looking on the Internet, I found a couple of articles about running a load-tester on apache and information on the tuning settings but not really much about a methodology (yeah yeah I work for a Big 4 firm, the word still makes me shudder even though it’s the right one to use here) to actually solve the problem of apache tuning.

So the “materials” I needed:

  • One server running apache.  Mine runs Apache2 under Debian Stable.  This is a little bit different from the average distro out there in that the process is apache2 and the command is apache2ctl where normally you would have httpd and httpdctl.  If you try this at home, you’ll need to use the latter commands.
  • An apache tuning guide or 3.  Here’s the simplest/most straightforward one I’ve seen.
  • A stress-tester.  Siege is awesome for this.
  • Some simple shell commands: htop (top works here too), ps, grep, and wc.

Now for the method to my madness…

I ssh into my server using three different sessions.  On one I run htop.  Htop is a version of top that gives you a colored output and supports multiple processors.  The output without stress-testing looks something like this:

(Click for a life-size image)

I keep one session free to edit files and do an emergency “killall apache2” if things get out of hand (and they will really quickly, I had to pull the plug about 20 times throughout this process).   I run a simple command on another ssh session to get a count of how many apache threads I have running:

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
11

OK, so far so good.  I’ve got 11 threads running with no load and RAM usage of 190MB.  I needed the extra “grep start” because it removes the text editor I have open on apache2.conf and anything else I might be doing in the background.

I also killed apache, waited 10 seconds, and looked at the typical RAM use.  With no apache running, I use about 80MB just for the OS and everything else I’m running.  This means that I’m using 110MB of RAM for 11 apache threads, which means I’m using ~10MB of RAM for each apache thread.  Now that’s something important I can use.

I took my tuning settings in apache2.conf (httpd.conf for most distros) (Apache2 uses the prefork module which uses threads, read the tuning guide for more info) and set them at the defaults listed in the tuning guide.  They became something like the following:

<IfModule prefork.c>
  StartServers            8
  MinSpareServers         5
  MaxSpareServers        20
  MaxClients            150
  MaxRequestsPerChild  1000
</IfModule>

Notice how the MaxClients is set at 150?  This will prove to be my downfall later.  Turns out that my server is RAM-poor for as much processor as it has or WordPress is a RAM hog (or both, which is the case =)  ).  I’ll eventually upgrade my server, but since it’s a cloud server from Mosso, I pay by the RAM and drive space.

After each edit of apache2.conf, you need to give apache a configuration test and reload:

server:~# apache2ctl configtest
Syntax OK                        <- If something else comes back, fix it!!
server:~# apache2ctl restart

I’m now ready to stress-test using the default setup.  This is the awesome part.  First, I need to simulate a load.  I make an url seedfile so that siege will bounce around between a handful of pages.  I make a file siege.urls.txt and put in a collection of urls so that it looks like the following:

http://www.guerilla-ciso.com/
http://www.guerilla-ciso.com/about
http://www.guerilla-ciso.com/contact
http://www.guerilla-ciso.com/papers-and-presentations
....<about 20 lines deleted here, you get the point>
http://www.guerilla-ciso.com/page/2
http://www.guerilla-ciso.com/page/3
http://www.guerilla-ciso.com/page/4

I’m sure there is an efficient and fun way to make this, like say, a text-only sitemap or sproxy which is made by the same guy who does siege, but since I only needed about 30 urls, I just cut-n-pasted them off the blog homepage.

I fire up siege and give my webserver a thorough drubbing, running 50 connections for 10 minutes and using my url seedfile.  BTW, I’m running siege on the webserver itself, so there isn’t anything in the way of network latency.  <enter sinister laugh of evil as I sadistically torture my apache and the underlying OS>

server:~# siege -c 50 -t 600s -f siege.urls.txt
** SIEGE 2.66
** Preparing 50 concurrent users for battle.    <-The guy writing siege has a wicked sense of humor.
The server is now under siege...                <-Man the ramparts, Apache, they're coming for you!
HTTP/1.1 200   1.08 secs:   16416 bytes ==> /
HTTP/1.1 200   1.07 secs:   16416 bytes ==> /
....<about 2 bazillion lines deleted here, you get the idea>
HTTP/1.1 200   4.66 secs:    8748 bytes ==> /about
HTTP/1.1 200   3.92 secs:    8748 bytes ==> /about
Lifting the server siege...      done.

Transactions:                  61 hits   <-No, this isn't actual, I abbreviated the siege output
Availability:              100.00 %      <-with a ctrl-c just to get some results so I didn't
Elapsed time:                6.70 secs   <-have to scroll through all that output from the real test.
Data transferred:            0.87 MB
Response time:                3.27 secs
Transaction rate:            9.10 trans/sec
Throughput:                0.13 MB/sec
Concurrency:               29.75
Successful transactions:          61
Failed transactions:               0
Longest transaction:            5.61
Shortest transaction:            1.07

Now I watch the output of htop.  Under stress, the output looks something like this:

(Click for a life-size image)

Hmm, looks like I have a ton of apache threads soaking up all my RAM.  What happens is that in about 30 seconds, the OS starts swapping and the swap use just keeps growing until the OS is unresponsive.  This is a very interesting cascade failure because writing to swap incurs a load which makes the OS write to swap more.  Maybe I need to limit either the amount of RAM used per apache or limit the maximum amount of threads that apache spawns.  The tuning guide tells us how…

There is one setting that is the most important in tuning apache, it’s MaxClients.  This is the maximum number of servers (with the worker module) or threads (prefork module).  Looking at my apache tuning guide, I get a wonderful formula: ($SizeOfTotalRAM – $SizeOfRAMForOS) / $RAMUsePerThread = MaxClients.  So in my case, (512 – 80) / 11 = 39.something.  Oops, this is a far cry from the 150 that comes as default.  I also know that the RAM/thread number I used was without any load on apache, so with a load on and generating dynamic content (aka WordPress) , I’ll probably use ~15MB per thread.

One other trick that I can use:  Since I think that what’s killing me is the number of apache threads, I can run with a reduced amount of simultaneous connections and watch htop.  When htop shows that I’ve just started to write to swap, I can run my ps command to find out how many apache threads I have running.

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
28

Now this is about what I expected:  With 28 threads going, I tipped over into using swap.  Reversing my tuning formula, I get (28 threads x 15 MB/thread) +80 MB for OS = 500 MB used.  Hmm, this makes much sense to me, since the OS starts swapping when you use ~480MB of RAM.

So I go back to my prefork module tuning.

<IfModule mpm_prefork_module>
 StartServers          8
 MinSpareServers       5
 MaxSpareServers      10
 MaxClients           25
 MaxRequestsPerChild   2000
</IfModule>

I set MaxClients at 25 because 28 seems to be the tipping point, so that gives me a little bit of “wiggle room” in case something else happens at the same time I’m serving under a huge load.  I also tweaked some of the other settings slightly.

Then it’s time for another siege torture session.  I run the same command as above and watch the htop output.  With the tuning settings I have now, the server dips into swap about 120MB and survives the full 10 minutes.  I’m sure the performance is degraded somewhat by going into swap, but I’m happy with it for now because the server stays alive.  It wasn’t all that smooth, I had to do a little bit of trial and error first, starting with MaxClients 25 and working my way up to 35 under a reduce siege load (-c 25 -t 60s) to see what would happen, then increasing the load from siege (-c50 -t 600s) and ratcheting MaxClients back down to 25.

And as far as me being a dork… well, aside from the huge MaxClients setting (That’s the default, don’t blame me), I set MaxRequestsPerChild to 100 instead of 1000, meaning that every 100 http requests I was rolling over and making a new thread.  That would lead to cascade failure under a load. (duh!)



Similar Posts:

Posted in Technical, The Guerilla CISO, What Works | 5 Comments »

The BSOFH On Dorky, Auditor-Friendly Policies

Posted January 16th, 2008 by

Roger writes about his workplace instituting a bag-check on a Friday afternoon. My first though was “Gack, that’s part of the FISMA guidance? Somebody definitely was reading between the lines,” followed by, “I wonder how much miscarriage of security is conducted by people who claim to be the long-lost intellectual progeny of Ron and Marianne (Ron Ross and Marianne Swanson from NIST, work with me here)”. Then I remembered my own security strangeness and laughed….

So a couple of years ago I was in a meeting between my physical security guy and an auditor from the government. I got there a couple of minutes late so I didn’t get introduced. No biggie, my guy had everything in control and had done most of the work with this auditor already. A tip-off should have been that I was the only guy in the room wearing a suit, thereby identifying myself as some kind of manager, but alas for our auditor wasn’t that bright.

But then a problem sprung up: it all revolves around physical access policy and procedure. I had a procedure that said that all employees, contractors, and visitors will badge in EVERY time they enter the building. OK, some of you should be saying a big “DUH!” at this point, and you would be right. Anyway, the auditor didn’t like that. They wanted a specific policy line that says “When you come into the building after a fire drill, you should all badge back in.”

I watched my physical security guy try to rationalize the finding away. “We already say that here in the general procedure,” he said. He drew a Ven diagram on the white board–“See, fire drill is part of ‘every'”. The auditor just wasn’t buying it.

As a last-ditch attempt, I stepped in with the classic contractor phrase: “Where does this requirement come from?” The auditor looked at me and not taking the hint that A) I know what I’m doing, B) I teach this stuff and C) I’m the guy in the suit, you would think I was important in some way; replied “Well, it comes from NIST. You see, they have this book of requirements called 800-53 and it says that you have to have a process to badge back in after a fire drill.”

At that point, I realized the situation. Life had handed me a bozo and it was easier to write a one-line correction than it was to try to educate them on the error of their ways and ask them to show me where it says that in SP 800-53.

So my advice to Roger: One afternoon checking bags (yay, my favorite activity to do in my “spare time”!) is sometimes easier than trying to educate your auditor.

And watch out for bozos. They’ll wear you down to a nub. =)



Similar Posts:

Posted in BSOFH, FISMA, What Doesn't Work | 5 Comments »

A New Take on Continuous Controls Monitoring

Posted June 10th, 2010 by

Some days I feel like all this “continuous monitoring” talk around the beltway is just really a codeword for “buy our junk”, much like the old standby “defense in depth”, only instead of firewalls and IDS, it’s desktop and server configuration management.  Even better that it works for both products and services.  The BSOFH in me likes having a phrase like “Near Real-Time Continuous Compliance Monitoring” which can mean anything from “tying thermite grenades to the racks in case of being captured” to “I think I’ll make a ham sandwich for lunch and charge you for the privilege”.

Anyway, our IKANHAZFIZMA lolcats have finally found a control worth monitoring:  the world’s supply of overstuffed cheeseburgers.  This continuous monitoring thing is serious business, just like the Internets.

kontinuus monitoring i kan get behind!



Similar Posts:

Posted in Uncategorized | 1 Comment »
Tags:

IKANHAZFIZMA Finds Caution Tape

Posted January 7th, 2010 by

Ah yes, the BSOFH is deep down inside every security manager doing all the things that we wish we could.  And so today we present a BSOFH in lolcat form.

For more BSOFH, check out posts here on guerilla-ciso and on layer8.

kawshun i iz bsofh kitteh



Similar Posts:

Posted in BSOFH, IKANHAZFIZMA | 2 Comments »
Tags:

Risk Management and Crazy People, a Script Using Stock Characters

Posted September 10th, 2009 by

Our BSOFH meets a Crazy Homeless Guy on the street just outside the Pentagon City metro station.

Crazy Homeless Guy: (walks up to BSOFH) Can I ask you a question?

BSOFH: (Somewhat startled, nobody really talks to him unless they’re trying to sell him something) Uhhhh, sure.

Crazy Homeless Guy: You know that there are people who claim to be able to say… take that truck over there and just by moving their finger make it fly into the Washington Monument.  Don’t you think that this is a threat to national security?

BSOFH: (Realizes that Crazy Homeless Guy is crazy and homeless) Not necessarily, you see.  I would definitely classify it as a threat.  However, when you’re looking at threats from people, you have to look at motives, opportunity, and motives.  Until you have all three, it’s more of an unrealized threat.

Crazy Homeless Guy: But what if these same guys could kill the President the same way, isn’t that a national threat?

BSOFH: Um, could be.  But then again, let’s look at a similar analogy:  firearm ownership.  Millions of people safely own weapons and yet there isn’t this huge upswell to shoot the President now is there?  Really, we have laws against shooting people and when somebody does that, we find them and put them in jail or *something*.  We don’t criminalize the threat, we criminalize the action.  Flicking a finger doesn’t kill people, psycho people kill people.

Crazy Homeless Guy: Or even if these same people could use the same amount of effort to kill everybody on the planet.  You know the <censored, I don’t like being sued by cults> people claim to have this ability.

BSOFH: (Jokingly, realizing that somebody has been taking 4chan too seriously) Well, I wouldn’t care too much because I would be… well, dead.  But yes, possibly.  But then again, since the dawn of the nuclear age and all through the Cold War we’ve had similar threats and people with capabilities created by technology instead of word study and the power of the human mind.  You have to look at these things from a risk standpoint.  While yes, these people have the capability to do something of high impact such as kill every human on the face of the earth, the track record of something like this happening is relatively small.  I mean, is there any historical record of a <censored, I don’t like being sued by cults> actually killing anybody through sheer force of their mind?  In other words, this is a very high impact, low probability event–something some people call a black swan event.  While yes, this is a matter of national security that these people potentially have this capability, we only have so many resources to protect things and we have our hands full dealing with risks that actually have occured in recent history.  In other words, risk management would say that this event you’re speaking of is an acceptable risk because of more pressing risks.

Crazy Homeless Guy: (Obviously beaten into oblivion by somebody crazier than himself) Well, I’ve never thought about it that way.  I’m really scared by these people.  Hold me, BSOFH.

BSOFH: Um, how about no?  You’re a Crazy Homeless Guy after all.  I have to get back to work now.  Come hang out sometime if you want to talk some quantitative risk analysis and we’ll start attaching dollar figures to the risks of <censored, I don’t like being sued by cults> killing all of humanity.  Doesn’t that sound like fun?  If we can get you cleared to get into the building, we can have a couple of whiteboarding sessions to determine the process flow and maybe an 800-30-stylie risk assessment just to present our case to the DHS Psychic Warfare Division.

Crazy Homeless Guy: Uh, I gotta find a better corner to stand on.  Maybe over by 16th and Pennsylvania I can find somebody more sympathetic to my cause.

BSOFH: You’re crazy, man!

Crazy Homeless Guy: You’re crazy, too, man!

And the moral of the story is that no matter how crazy you think you are, somebody else will always show up to prove you wrong.  And yeah, black swan events where we all die are dumb to prepare for because we’ll all be dead–near total fatalities only matter if you’re one of the survivors.

This story is dedicated to Alex H, David M, and some guy named Bayes.

OMG It’s a Psychic Black Swan photo by gnuckx cc0.



Similar Posts:

Posted in BSOFH, Risk Management, The Guerilla CISO | 5 Comments »
Tags:

« Previous Entries


Visitor Geolocationing Widget: