Posted by Ian Holsman
Mon, 28 Jan 2008 19:33:00 GMT
AOL just released an internal performance tool called Pagetest.
It’s a IE plugin which breaks down how long the individual components take. very similar to ItScales stuff I did years ago. (but much better).
Gratz Dave, Carson, & Pat!
Posted in monitoring | Tags aol, performance | 1 comment
Posted by Ian Holsman
Sun, 09 Jul 2006 15:57:45 GMT
Perfmon is a tool to help you diagnose your performance and QA issues within your Django application.
I’ve decided to charge $20 for it.
People using it for debugging open source projects can get it for free.
Is this a sell out? I hope people don’t see it this way.
Posted in Development | Tags django, monitoring, performance | 4 comments | no trackbacks
Posted by Ian Holsman
Thu, 29 Jun 2006 04:18:00 GMT
two interesting posts arrived on the memcached list which might be interesting to performance people.
The first was a comparison of The fastest lanugage binding on which ‘P’ language performed better. To make a note the PHP version actually uses libmemcache a ‘C’ library which goes a bit of the way to explain the wild disparity in speeds.
The 2nd more interesting one (to me) was the discussion of how Digg switched from using mysql to memcached with v3 of their new interface to handle storing sessions, due to a hardware crash on their mysql server.
others mentioned using InnoDB for this instead of MyISAM, with the biggest issue being clearing out expired sessions (which memcached does for you with less overhead), but storing the sessions in the database still suffered due to OS-contention.
of course with django you can choose either, to cache your stuff.. but the session handling is stored directly in the database .. looks like I have a weekend project ;-)
Posted in Development | Tags django, memcached, mysql, performance, PHP | 4 comments | no trackbacks
Posted by Ian Holsman
Sun, 18 Dec 2005 16:52:00 GMT
here’s a heartfelt request for someone else to do.
Wouldn’t it be wonderful if akamai or someone else with large fat pipes offered to host some of the more common javascript libraries instead of every company forcing you to download the same thing off their server?
or better yet.. if mozilla could have a way to integrate it into their regular download.
say your page wants to use dojo 0.2. It would include a reference to something like:
<script src='cacheit:dojotoolkit.org/0.2/js/dojo.js' >
and then the user will have it for cached on their local machine. ready for the next application which uses it.
you could even use the regular mozilla update check to get new versions of it.
and possibly CDN’s like nyud.net or akamai (maybe google ?) to help spread the love.
from what I can see there are 3 or 4 popular javascript frameworks.. so your talking about 30-40M of disk space (you might need multiple versions). and if you cache it right the traffic would be minor (a 304 once a day + 1-200k when a new release hits per user) as long as the javascript libraries are quite stable it should be fine and enhance the user experience for everyone using them.
Posted in General | Tags performance | 5 comments | 1 trackback
Posted by Ian Holsman
Thu, 08 Dec 2005 13:17:00 GMT
in Don’t scale: 99.999% uptime is for Wal-Mart he mentioned that 37signals is quite happy with 98% uptime, and the cost of increasing uptime isn’t worth it.
Here is a brief summary of what a extra ‘9’ will give you as far as uptime. (as a rule of thumb, each extra nine you add a extra zero at the end of the price it will cost you to get there).
| Uptime | Time lost in a year |
| 98% | 7.3 days |
| 99.0% | 3.7 days |
| 99.9% | 8 hours |
| 99.99% | 1 hour |
| 99.999% | 5 minutes |
Personally I think uptime is more a measure of reliability and redundancy than scalabilty, and would be sceptical when people talk about uptime.
why? well.. what is uptime? in most cases it means that a service is up and handling requests.
what it doesn’t measure (and hence not tell you)
- How responsive that service is. people will stop using your service if it is too slow. uptime does not measure this.
- *when* it was down. having something go down at 3AM is not the same as it being down at 3PM. while the world is global, most people only care about the USA. uptime doesn’t not know when your core business hours are.
- when something is partially down. Do you define yourselves as being ‘up’ when only half your site is functioning?
I think companies should define a metric more along the lines of:
the time taken to complete XXXX operation, between the hours 9AM and 9PM.
and then combine these timings into a weighted average. The weights being how important that operation is to your core business.
measure & monitor that. not uptime.
Have a look at Grab perf for an example of this. Stephen measures the response time as well as availability.
Posted in Business Related | Tags monitoring, performance, startups | 2 comments | 1 trackback
Posted by Ian Holsman
Wed, 07 Dec 2005 15:13:00 GMT
In Jeremy’s Article:
Web 2.0 Companies NEED To Scale,
he highlights a certain process that he belives most start ups follow.
Yeah. Here’s their process:
- 1. Start with a handful of users. This is too much for ded box.
- 2. Move to dedicated server.
- 3. Add a few more users til they’re at 100. This is too much for one box.
- 4. Add more hardware. It’s obvious this isn’t enough.
- 5. Recode.
Here is my view (albeit never working in a startup, but having working on integrating a couple of small-large ones) of how it works.
- 0. get your savings together
- 1. build a prototype, show it to some friends and put your hand out for some $$$
- 2. with something functional, approach the VC’s
- 3. use VC cash to just buy more machines, and concentrate on adding features to bring in the money, not ‘optimized’ solutions
- 4. concentrate on your growth rate. you need to have this as high as possible, not your user base
- 5. once you have become a ‘market leader’ and have series ‘B’ funding, then concentrate on employing a person to tune your application, or just buy some new machines which are 2-3 times faster than the ones you originally have
OR
- 5. Flip it
- 6. Let the buyer integrate the application into their infrastructure, processes and standards. From my experience this will double the user-base and increase the growth rate as they put your application on their scalable services, and optimize the unscalable bits by replacing them with their exsiting stuff, or rewriting them. Maybe it’s just me, but the value of the aquisition isn’t the technology, it’s the userbase.
- 7. Wait a year till your options vest and repeat/buy a yacht
If I was starting my own company, I would be concentrating on increasing the value of it ASAP, and I would do it in two ways. Increasing the features and user base as fast as possible, making it as hard as possible for anyone else to enter in just after me.
as long as the user experience is acceptable, I would not be concerned about scalability in the slightest, especially if spending $5k to buy another box will make the problem go away for 3-6 months/until I can get some more funding.
This article can be summarised as “I would rather have a developer working on something which will increase the money coming in today, rather than having him work on something which will increase the money coming in a month from now, as I don’t know WTF will be happening in a month”
Tags performance, startup | 5 comments | 1 trackback
Posted by Ian Holsman
Wed, 21 Sep 2005 18:01:00 GMT
the first thing I did was reduce the number of active threads on the webserver, and that reduced the process creation rate by half. which was a good start.
Now this where statistics lie, or to put it more precisely don’t tell you what you think they are telling you.
I first looked at 5 minute averages and assumed that it was constant throughout the 5 minutes..wrong.. as this is a monitoring machine, it has lots of agents pushing data to it at regular intervals.
the main culprit had this embedded into the code.
# Wait for the next INTERVAL
sleep ($INTERVAL-time() % $INTERVAL);
this had the effect of turning 815 machines into a flash crowd every 5 minutes (not even being delayed by the time it took to complete the previous post, which would have had the effect of dispersing the flash crowd over time). every 5 minutes the poor webserver would receive 815 posts… go into swap hell, core dump a bit, and recover in time for the next lot.
the solution to this one was to wait a random interval before actually sending the data, and then hit the above syncing sleep so we still get stats from the same time period, but they are just sent a during the period instead of when we get them.
If you plan on doing this remember to record the stats to be effective as of before the random sleep interval, not afterwards as your counters would get all messed up.
you want
#hits = (#hits-end - #hits-start )/ ($timeend - $timestart)
not
#hits = (#hits-end - #hits-start)/ ($timeend+randombit - $timestart)
especially when the randombit gets large
oh.. in other news.. I have switched my RSS feed to use feedburner, current subscribtions are unaffected.. only new ones.
Tags monitoring, performance | no comments | no trackbacks
Posted by Ian Holsman
Tue, 20 Sep 2005 19:45:00 GMT
So.. people at work have been complaining about one my monitoring servers continually freaking out, and being slow..
so I thought.. why not open up a 2nd port with just mod-perl replacing some dodgy CGIs, freeing up some connections on the original port making it snappier at the same time.. win-win.
This is what I wake up to ;(.

The machine has gone beserk. for some reason a ‘stable’ perl program running via peristent perl had none of these issues, change it to mod-perl .. and wow less than 24 hours before I need to bounce.
now.. If I didn’t have
procallator running on the box I would still be puling my hair out trying to figure what was going on.. at least now I have a clue on what the problem looks like before the machine hangs.
I’m still not 100% confident that it is directly related to my change.. now that the ‘display’ server is faster, it got hammered harder by other things ;( I’m sure this will provide a weeks of hair pulling.
oh.. all the perl code does is write a file to disk, and it’s been running for years..so it is an interaction somewhere.
Posted in monitoring | Tags monitoring, performance | no comments | no trackbacks