Archive for 8th April 2009

WTF is EMT?

EMT provides an easy way to gather common system performance metrics, as well as providing a simple plugin-based interface to collect custom application-specific metrics. This data can be viewed on the servers that are collecting it or, through the output handler interface, be sent to centralized servers.

I started building EMT because it was very difficult to do ad hoc comparisons of performance metrics when trying to diagnose why systems are or were overloaded. Often times the existing monitoring was only setup to gather data from the operating system such as cpu or network usage. The solution was always to build something quickly by hand, run it for a while, and hope it gathers something meaningful. That’s OK for solving one problem but what about the next time?

EMT runs a set of very light weight scripts out of cron that in turn execute commands and parse data from them. The data can be stored locally (highly recommended) or shipped off to be aggregated on central servers, or both. The power of EMT is really in the local storage. The emt_view command can be used to compare any or all metrics from the system to look for patterns. Example usage can be found in the manual. I highly recommend checking it out.

The installation instructions say to grab the source from svn and use the included script to build an rpm. That’s still the recommended way but I’ve created a snapshot of the source and starter rpm for this blog entry. In the future I’ll setup a more defined release process.

Now for the bad news. EMT is still under heavy development. The view code and how instances work are is going to change a lot before I release it. Things like viewing data from different instances of running programs doesn’t work. That being said it shouldn’t destroy your servers or steal your children if you decide to run it in production. Proven Scaling and a few of our clients have been using it for quite some time with very few issues. If there is enough feedback I’ll bump the version to 0.3 and start stabilizing it for a real release.

If you want to become involved with the project there is a google code page and a google group for discussion. I’ll post development updates to the group page. If you find any issues please report them on the google code issue page.