Stress-Test Apache with Intent to Tune: BSOFH Tip for the Software Masochist

Posted August 28th, 2009 by

So I’ve been having some problems with my server for a month or so–periodically the number of apache servers would skyrocket and the box would get so overloaded (load of ~50 or so) that I couldn’t even run simple commands on it.  I would have to get into the hardware console and give the box a hard boot (a graceful reboot wouldn’t work).

Root cause is I’m a dork, but more about that later.

Anyway, I needed a way to troubleshoot and fix it.  The biggest problem I had was that the problem was very sporadic–sometime it would be 2 weeks between crashes, other times it would be 3 times in one day.  This is so begging for a stress-test really badly.  Looking on the Internet, I found a couple of articles about running a load-tester on apache and information on the tuning settings but not really much about a methodology (yeah yeah I work for a Big 4 firm, the word still makes me shudder even though it’s the right one to use here) to actually solve the problem of apache tuning.

So the “materials” I needed:

  • One server running apache.  Mine runs Apache2 under Debian Stable.  This is a little bit different from the average distro out there in that the process is apache2 and the command is apache2ctl where normally you would have httpd and httpdctl.  If you try this at home, you’ll need to use the latter commands.
  • An apache tuning guide or 3.  Here’s the simplest/most straightforward one I’ve seen.
  • A stress-tester.  Siege is awesome for this.
  • Some simple shell commands: htop (top works here too), ps, grep, and wc.

Now for the method to my madness…

I ssh into my server using three different sessions.  On one I run htop.  Htop is a version of top that gives you a colored output and supports multiple processors.  The output without stress-testing looks something like this:

(Click for a life-size image)

I keep one session free to edit files and do an emergency “killall apache2” if things get out of hand (and they will really quickly, I had to pull the plug about 20 times throughout this process).   I run a simple command on another ssh session to get a count of how many apache threads I have running:

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
11

OK, so far so good.  I’ve got 11 threads running with no load and RAM usage of 190MB.  I needed the extra “grep start” because it removes the text editor I have open on apache2.conf and anything else I might be doing in the background.

I also killed apache, waited 10 seconds, and looked at the typical RAM use.  With no apache running, I use about 80MB just for the OS and everything else I’m running.  This means that I’m using 110MB of RAM for 11 apache threads, which means I’m using ~10MB of RAM for each apache thread.  Now that’s something important I can use.

I took my tuning settings in apache2.conf (httpd.conf for most distros) (Apache2 uses the prefork module which uses threads, read the tuning guide for more info) and set them at the defaults listed in the tuning guide.  They became something like the following:

<IfModule prefork.c>
  StartServers            8
  MinSpareServers         5
  MaxSpareServers        20
  MaxClients            150
  MaxRequestsPerChild  1000
</IfModule>

Notice how the MaxClients is set at 150?  This will prove to be my downfall later.  Turns out that my server is RAM-poor for as much processor as it has or WordPress is a RAM hog (or both, which is the case =)  ).  I’ll eventually upgrade my server, but since it’s a cloud server from Mosso, I pay by the RAM and drive space.

After each edit of apache2.conf, you need to give apache a configuration test and reload:

server:~# apache2ctl configtest
Syntax OK                        <- If something else comes back, fix it!!
server:~# apache2ctl restart

I’m now ready to stress-test using the default setup.  This is the awesome part.  First, I need to simulate a load.  I make an url seedfile so that siege will bounce around between a handful of pages.  I make a file siege.urls.txt and put in a collection of urls so that it looks like the following:

http://www.guerilla-ciso.com/
http://www.guerilla-ciso.com/about
http://www.guerilla-ciso.com/contact
http://www.guerilla-ciso.com/papers-and-presentations
....<about 20 lines deleted here, you get the point>
http://www.guerilla-ciso.com/page/2
http://www.guerilla-ciso.com/page/3
http://www.guerilla-ciso.com/page/4

I’m sure there is an efficient and fun way to make this, like say, a text-only sitemap or sproxy which is made by the same guy who does siege, but since I only needed about 30 urls, I just cut-n-pasted them off the blog homepage.

I fire up siege and give my webserver a thorough drubbing, running 50 connections for 10 minutes and using my url seedfile.  BTW, I’m running siege on the webserver itself, so there isn’t anything in the way of network latency.  <enter sinister laugh of evil as I sadistically torture my apache and the underlying OS>

server:~# siege -c 50 -t 600s -f siege.urls.txt
** SIEGE 2.66
** Preparing 50 concurrent users for battle.    <-The guy writing siege has a wicked sense of humor.
The server is now under siege...                <-Man the ramparts, Apache, they're coming for you!
HTTP/1.1 200   1.08 secs:   16416 bytes ==> /
HTTP/1.1 200   1.07 secs:   16416 bytes ==> /
....<about 2 bazillion lines deleted here, you get the idea>
HTTP/1.1 200   4.66 secs:    8748 bytes ==> /about
HTTP/1.1 200   3.92 secs:    8748 bytes ==> /about
Lifting the server siege...      done.

Transactions:                  61 hits   <-No, this isn't actual, I abbreviated the siege output
Availability:              100.00 %      <-with a ctrl-c just to get some results so I didn't
Elapsed time:                6.70 secs   <-have to scroll through all that output from the real test.
Data transferred:            0.87 MB
Response time:                3.27 secs
Transaction rate:            9.10 trans/sec
Throughput:                0.13 MB/sec
Concurrency:               29.75
Successful transactions:          61
Failed transactions:               0
Longest transaction:            5.61
Shortest transaction:            1.07

Now I watch the output of htop.  Under stress, the output looks something like this:

(Click for a life-size image)

Hmm, looks like I have a ton of apache threads soaking up all my RAM.  What happens is that in about 30 seconds, the OS starts swapping and the swap use just keeps growing until the OS is unresponsive.  This is a very interesting cascade failure because writing to swap incurs a load which makes the OS write to swap more.  Maybe I need to limit either the amount of RAM used per apache or limit the maximum amount of threads that apache spawns.  The tuning guide tells us how…

There is one setting that is the most important in tuning apache, it’s MaxClients.  This is the maximum number of servers (with the worker module) or threads (prefork module).  Looking at my apache tuning guide, I get a wonderful formula: ($SizeOfTotalRAM – $SizeOfRAMForOS) / $RAMUsePerThread = MaxClients.  So in my case, (512 – 80) / 11 = 39.something.  Oops, this is a far cry from the 150 that comes as default.  I also know that the RAM/thread number I used was without any load on apache, so with a load on and generating dynamic content (aka WordPress) , I’ll probably use ~15MB per thread.

One other trick that I can use:  Since I think that what’s killing me is the number of apache threads, I can run with a reduced amount of simultaneous connections and watch htop.  When htop shows that I’ve just started to write to swap, I can run my ps command to find out how many apache threads I have running.

rybolov@server:~$ ps aux | grep apache2 | grep start | wc -l
28

Now this is about what I expected:  With 28 threads going, I tipped over into using swap.  Reversing my tuning formula, I get (28 threads x 15 MB/thread) +80 MB for OS = 500 MB used.  Hmm, this makes much sense to me, since the OS starts swapping when you use ~480MB of RAM.

So I go back to my prefork module tuning.

<IfModule mpm_prefork_module>
 StartServers          8
 MinSpareServers       5
 MaxSpareServers      10
 MaxClients           25
 MaxRequestsPerChild   2000
</IfModule>

I set MaxClients at 25 because 28 seems to be the tipping point, so that gives me a little bit of “wiggle room” in case something else happens at the same time I’m serving under a huge load.  I also tweaked some of the other settings slightly.

Then it’s time for another siege torture session.  I run the same command as above and watch the htop output.  With the tuning settings I have now, the server dips into swap about 120MB and survives the full 10 minutes.  I’m sure the performance is degraded somewhat by going into swap, but I’m happy with it for now because the server stays alive.  It wasn’t all that smooth, I had to do a little bit of trial and error first, starting with MaxClients 25 and working my way up to 35 under a reduce siege load (-c 25 -t 60s) to see what would happen, then increasing the load from siege (-c50 -t 600s) and ratcheting MaxClients back down to 25.

And as far as me being a dork… well, aside from the huge MaxClients setting (That’s the default, don’t blame me), I set MaxRequestsPerChild to 100 instead of 1000, meaning that every 100 http requests I was rolling over and making a new thread.  That would lead to cascade failure under a load. (duh!)



Similar Posts:

Posted in Technical, The Guerilla CISO, What Works | 5 Comments »

OMB Wants a Direct Report

Posted August 28th, 2009 by

The big news in OMB’s M-09-29 FY 2009 Reporting Instructions for the Federal Information Security Management Act and Agency Privacy Management is that instead of fiddling with document files reporting will now be done directly through an online tool. This has been covered elsewhere and it is the one big change since last year.  However having less paper in the paperwork is not the only change.

Piles of Paper photo by °Florian.

So what will this tool be like? It is hard to tell at this point. Some information will be entered directly but the system appears designed to accept uploads of some documents, such as those supporting M-07-16. Similar to the spreadsheets used for FY 2008 there will be separate questions for the Chief Information Officer, Inspector General and Senior Agency Official for Privacy. Microagencies will still have abbreviated questions to fill out. Additional information on the automated tool, including full instructions and a beta version will be available in August, 2009.

Given the required information has changed very little the automated system is unlikely to significantly ease the reporting burden. This system appears primarily designed to ease the data processing requirements for OMB. With Excel spreadsheets no longer holding data many concerns relating to file versions, data aggregation and analysis are greatly eased.

It is worth noting that a common outcome of systems re-engineered to become more efficient is that managers look to find ways to utilize the new efficiency. What does this mean? Now that OMB has the ability to easily analyze data which took a great amount of effort to process before they may want to improve what is reported. A great deal has been said over the years about the inefficiencies in the current reporting regime. This may be OMB’s opportunity to start collecting an increased amount of information that may better reflect agencies actual security posture. This is pure speculation and other factors may moderate OMB’s next steps, such as the reporting burden on agencies, but it is worth consideration.

One pleasant outcome to the implementation of this new automated tool is the reporting deadline has been pushed back to November 18, 2009.

Agencies are still responsible for submitting document files to satisfy M-07-16. The automated tool does not appear to allow direct input of this information. However the document requirements are slightly different. Breach notification policy document need only be submitted if it has changed. It is no longer sufficient to simply report progress on eliminating SSNs and reducing PII, an implementation plan and a progress update must be submitted. The requirement for a policy document covering rules of behavior and consequences has been removed.

In addition to the automated tool there are other, more subtle changes to OMB’s FY 2009 reporting. Let’s step through them, point by point:

10. It is reiterated that NIST guidance is required. This point has been expanded to state that legacy systems, agencies have one year to come into compliance with NIST documents new material. For new systems agencies are expected to be in compliance upon system deployment.

13 & 15. Wording indicating that disagreements on reports should be resolved prior to submission and that the agency head’s view will be authoritative have been removed. This may have been done to reduce redundancy as M-09-29’s preface indicates agency reports must reflect the agency head’s view.

52. The requirement for an central web page with working links to agency PIAs and Federal Register published SORNs has been removed.

A complete side-by-side comparison of changes between the two documents is available at FISMApedia.org.

All in all the changes to OMB’s guidance this year will not change agencies reporting burden significantly. And that may not be a bad thing.



Similar Posts:

Posted in FISMA, NIST, Public Policy | 1 Comment »
Tags:

Note to the Data People: Give us Some Raw InfoSec Data

Posted August 24th, 2009 by

We have all these data wonks running around now in the information security field thanks to a couple of people (Jaquith, Shostack, Stewart, and our friends at Verizon Business) who brought us some books and some data.

Well, earlier this year, the Government started a website called Data.gov.  This is much awesomeness, Viva Las Transpareny!  However, it’s missing something very relevant to my interests: information security management data.

So, I want people to go to data.gov’s “request a dataset” page and request the following:

Complete responses from the Departments and Agencies to the FISMA reporting requirements for FY2004-2009 based on OMB Memoranda 04-25, 05-15, 06-20, 07-19, 08-21, and 09-29.

Raw incident data for years 2005-2007 as reported to OMB and summarized in their report to Congress on FY2007 FISMA performance and published at http://www.whitehouse.gov/omb/inforeg/reports/2007_fisma_report.pdf

Raw incident data for years 2007 and later in any type and format similar to the Verizon Data Breach Incident Report available at http://www.verizonbusiness.com/resources/security/reports/2009_databreach_rp.pdf

This information is necessary for researchers to study the effectiveness of information security management techniques and regulatory schemes and for industry to propose changes to national-level information security management frameworks and legislation such as FISMA.  This information for the most part has been released in a summary format to Congress and the release of the complete dataset on data.gov would greatly aid the information security community.

It might be a fool’s errand at this point, but it doesn’t hurt to ask, and it only takes a couple of minutes to do.  =)



Similar Posts:

Posted in Public Policy | 6 Comments »
Tags:

A Layered Model for Massively-Scaled Security Management

Posted August 24th, 2009 by

So we all know the OSI model by heart, right?   Well, I’m offering up my model of technology management. Really at this stage I’m looking for feedback

  • Layer 7: Global Layer. This layer is regulated by treaties with other nation-states or international standards.  I fit cybercrime treaties in here along with the RFCs that make the Internet work.  Problem is that security hasn’t really reached much to this level unless you want to consider multinational vendors and top-level cert coordination centers like CERT-CC.
  • Layer 6: National-Level Layer. This layer is an aggregation of Federations and industries and primarily consists of Federal law and everything lumped into a “critical infrastructure” bucket.  Most US Federal laws fit into this layer.
  • Layer 5: Federation/Community Layer. What I’m talking here with this layer is an industry federated or formed in some sort of community.  Think major verticals such as energy supply.  It’s not a coincidence that this layer lines up with DHS’s critical infrastructure and key resources breakdown but it can also refer to self-regulated industries such as the function of PCI-DSS or NERC.
  • Layer 4: Enterprise Layer. Most security thought, products, and tools are focused on this layer and the layers below.  This is the realm of the CSO and CISO and roughly equates to a large corporation.
  • Layer 3: Project Layer. Collecting disparate technologies and data into a similar piece such as the LAN/WAN, a web application project, etc.  In the Government world, this is the location for the Information System Security Officer (ISSO) or the System Security Engineer (SSE).
  • Layer 2: Integration Layer. Hardware, software, and firmware combine to become products and solutions and is focused primarily on engineering.
  • Layer 1: Code Layer. Down into the code that makes everything work.  This is where the application security people live.

There are tons of way to use the model.I’m thinking each layer has a set of characteristics like the following:

  • Scope
  • Level of centralization
  • Responsiveness
  • Domain expertise
  • Authority
  • Timeliness
  • Stakeholders
  • Regulatory bodies
  • Many more that I haven’t thought about yet

Chocolate Layer Cake photo by foooooey.

My whole point for this model is that I’m going to try to use it to describe the levels at which a particular problem resides at and to stimulate discussion on what is the appropriate level at which to solve it.  For instance, take a technology and you can trace it up and down the stack. Say Security Event and Incident Monitoring:

  • Layer 7: Global Layer. Coordination between national-level CERTs in stopping malware and hacking attacks.
  • Layer 6: National-Level Layer. Attack data from Layer 5 is aggregated and correlated to respond to large incidents on the scale of Cyberwar.
  • Layer 5: Federation/Community Layer. Events are filtered from Layer 4 and only the confirmed events or interest are correlated to determine trends.
  • Layer 4: Enterprise Layer. Events are aggregated by a SIEM with events of interest flagged for response.
  • Layer 3: Project Layer. Logs are analyzed in some manner.  This is most likely the highest in the model that we
  • Layer 2: Integration Layer. Event logs have to be written to disk and stored for a period of time.
  • Layer 1: Code Layer. Code has to be programmed to create event logs.

I do have an ulterior motive.  I created this model because most of our security thought, doctrine, tools, products, and solutions work at Layer 4 and below.  What we need is discussion on Layers 5 and above because when we try to create massively-scaled security solutions, we start to run into a drought of information at what to do above the Enterprise.  There are other bits of doctrine that I want to bring up, like trying to solve any problem at the lowest level for which it makes sense.  So in other words, we can use the model to propose changes to the way we manage security… say we have a problem like the lack of data on data breaches.  What we’re saying when we say that we need a Federal data breach law is that because of the scope and the amount of responsibility and competing interests at Layer 5, that we need a solution at Layer 6, but in any case we should start at the bottom and work our way up the model until we find an adequate scope and scale.

So, this is my question to you, Internet: have I just reinvented enterprise public policy, IT architecture (Federal Enterprise Architecture) and business blueprinting, or did I create some kind of derivative view of technology, security, and public policy that I can now use?



Similar Posts:

Posted in Public Policy | 6 Comments »
Tags:

Save a Kitten, Write SCAP Content

Posted August 7th, 2009 by

Apparently I’m the Internet’s SCAP Evangelist according to Ed Bellis, so at this point all I can do is shrug and say “OK, I’ll teach people about SCAP”.

Right now there is a “pretty OK” framework for SCAP.  IE, we have published standards, and there are some SCAP-certified tools out there to do patch and vulnerability management.

What’s missing right now is SCAP content.  I don’t think this is going to get solved en-masse, it’s more like there needs to be an awareness campaign directed at end-users, vulnerability researchers, and people who write small-ish tools.

So I sat around at home trying to figure out how to get people to use/write more SCAP content and finally settled on “Everytime you use SCAP content, a kitten runs free”.

Anyway, this is a presentation I gave at my local OWASP chapter.



Similar Posts:

Posted in NIST, Speaking, Technical | 4 Comments »
Tags:

Random Thoughts on “The FISMA Challenge” in eHealthcare

Posted August 4th, 2009 by

OK, so there’s this article being bounced all over the place.  Basic synopsis is that FISMA is keeping the government from doing any kind of electronic health records stuff because FISMA requirements extend to health care providers and researchers when they take data from the Government.

Read one version of the story here

So the whole solution is that, well, we can’t extend FISMA to eHealthcare when the data is owned by the Government because that security management stuff gets in the way.  And this post is about why they’re wrong and right, but not in the places that they think they are.

Government agencies need to protect the data that they have by providing “adequate security”.  I’ve covered this a bazillion places already. Somewhere somehow along the lines we let the definition of adequate security mean “You have to play by our rulebook” which is complete and utter bunk.  The framework is an expedient and a level-setting experience across the government.  It’s not made to be one-size-fits-all, but is instead meant to be tailored to each individual case.

The Government Information Security Trickle-Down Effect is a name I use for FISMA/NIST Framework requirements being transferred from the Government to service providers, whether they’re in healthcare or IT or making screws that sometimes can be used on the B2 bombers.  It will hit you if you take Government data but only because you have no regulatory framework of your own with which you can demonstrate that you have “adequate security”.  In other words, if you provide a demonstrable level of data protection equal to or superior to what the Government provides, then you should reasonably be able to take the Government data, it’s finding the right “esperanto” to demonstrate your security foo.

If only there was a regulatory scheme already in place that we could use to hold the healthcare industry to.  Oh wait, there is: HIPAA.  Granted, HIPAA doesn’t really have a lot of teeth and its effects are maybe demonstrable, but it does fill most of the legal requirement to provide “adequate security”, and that’s what’s the important thing, and more importantly, what is required by FISMA.

And this is my problem with this whole string of articles:  The power vacuum has claimed eHealthcare.  Seriously, there should be somebody who is in charge of the data who can make a decision on what kinds of protections that they want for it.  In this case, there are plenty of people with different ideas on what that level of protection is so they are asking OMB for an official ruling.  If you go to OMB asking for their guidance on applying FISMA to eHealthcare records, you get what you deserve, which is a “Yes, it’s in scope, how do you think you should do this?”

So what the eHealthcare people really are looking for is a set of firm requirements from their handlers (aka OMB) on how to hold service providers accountable for the data that they are giving them.  This isn’t a binary question on whether FISMA applies to eHealthcare data (yes, it does), it’s a question of “how much is enough?” or even “what level of compensating controls do we need?”

But then again, we’re beaten down by compliance at this point.  I know I feel it from time to time.  After you’ve been beaten badly for years, all you want to do is for the batterer to tell you what you need to do so the hurting will stop.

So for the eHealthcare agencies, here is a solution for you.  In your agreements/contracts to provide data to the healthcare providers, require the following:

  • Provider shall produde annual statements for HIPAA compliance
  • Provider shall be certified under a security management program such as an  ISO 27001, SAS-70 Type II, or even PCI-DSS
  • Provider shall report any incident resulting in a potential data breach of 500 or more records within 24 hours
  • Financial penalties for data breaches based on number of records
  • Provider shall allow the Government to perform risk assessments of their data protection controls

That should be enough compensating controls to provide “adequate security” for your eHealthcare data.  You can even put a line through some of these that are too draconian or high-cost.  Take that to OMB and tell them how you’re doing it and how they would like to spend the taxpayers’ money to do anything other than this.

Case Files and Medical Records photo by benuski.



Similar Posts:

Posted in FISMA, Rants | 1 Comment »
Tags:

« Previous Entries


Visitor Geolocationing Widget: