keskiviikko 10. marraskuuta 2010

More on software RAID 10: Benchmarking the heck out of it

Background
I already touched the RAID issue in previous blog post of mine. I have been doing lot of (software) RAID 10 installs lately, mainly in openSUSE that I use for workstations and servers. OpenSUSE seemingly has some bugs when installing on RAID 10, at leasts on Dell Precision T3500 machines, but they are resolvable with some command line magic. Anyway RAID 10 is at least four hard drive setup where your data is both striped and mirrored, which in my mind is a nice balance between data preservation and performance. RAID 10 enables you to utilize half of the available disk space, while the other half is used for mirroring. RAID 10 also has 66% chance of surviving 2-drive failure.

Before creating my first RAID 10 array for my desktop at work I read up on RAID 10 performance and settings that affect that. From reading several blog and forum posts on the issue I came up with the impression that F2 RAID layout and chunk size around 256kb to 512kb would be the best performing setup. I now realize that it is not that simple, but varies greatly by the usage scenario. After that realization I decided to test different ways of creating 4-disk RAID 10 and wrote a script that loops through (almost) all ways of creating SW RAID 10 and runs some tests on the RAID to help me decice optimal settings for my usage scenario.

Testing
As I mentioned in previous paragraph, I wrote a small and primitive shell script that handles creating the software RAID 10 array automatically, tests it to extent I wanted and writes down the results for analysis. Some idea of the scale of this testing gives that IOzone tests alone took almost a week to execute on my test bench and I ended up with 1500 datapoints (1500 different variations of the RAID 10 setup). Some of the variables used in the testing were: filesystem (ext3, ext4 etc), RAID layout (n2/f2), chunk size (64kb ... 2mb), stride and stripe.

The results are not exactly scientific grade data, but they are sufficient enough for me to draw some conclusions on the matter. You should not use them as absolute proof of anything, I am first to admit that my scripting skill and knowledge in filesystems and RAID settings is limited.

The script runs several different kinds of tests that happen to interest me personally, it would be quite trivial to add more tests there, but these are the ones that interested me the most. IOzone is the one test that I already mentioned, the script also runs hdparm read timings test, Combilebench and something called Custom. Custom test for me is just basically measuring a real life usage scenario here at work, in which I simulate what my continuous integration servers do day in, day out. My Custom basically times how long it takes to make a local clone from Mercurial SCM repository to the RAID, then builds our Qt-project according to our build steps (configuration, cleaning, compilation and so on). The Custom step obviously needs to be disabled or modified to meet ones needs.

The Script
The script is available here: raid_benchmark.sh.
Keep in mind that you is it on your own risk and will need to modify it to some extent to suit your own needs. Most important point is that it has only implementation for SUSE and Debian based distributions, of which openSUSE is tested.

Benchmark results will come on later date.



torstai 16. syyskuuta 2010

Using RAID and S.M.A.R.T to save yourself from data loss and lot's of grief

I have been configuring SMART monitoring to lot of my servers and workstations, both work and home, lately. Especially when combined with RAID, SMART can help you to avoid disasters that would occur when you lose your data in an event you hard drives break.

Short primer
RAID means "Redundant Array of Independent/Inexpensive Disks" and is mostly thought something you would do on hardware level via a separate RAID-controller or motherboard that supports RAID. RAID is used to get performance improvement for your disk operations, which is the biggest bottleneck in modern systems most of the time, or to provide redundancy when you lose your drive to hardware failure.

With Linux you can use something called 'software RAID' which does not require expensive RAID-controller nor support from motherboard. All you basically need is more than one hard drive. With two hard drives you can already setup a RAID0 or RAID1.
RAID0 means that you distribute your data on two (or more) drives and get a huge performance benefit, but the downside is that if you lose either disk to HW failure, you lose all the data on both disks.
RAID1 is quite the opposite of RAID0, RAID1 mirrors the data on first drive to second drive, which means that write operations are somewhat slower but the data is now duplicated and thus safe from HW failure.
There are several other RAID levels too, like RAID5 which requires 3 drives minimum and provides some redundancy in case of disk breakage and some performance improvements. My personal favorite currently is RAID10, which is a combination of RAID0 and RAID1, meaning that you get quite good redundancy in case of failures and quite nice performance boost also. The downside of RAID10 is that you need 4 drives to get started and 'lose' two of them for the mirroring.

For more information about RAID levels check this wikipage and for information how to create RAID the hard way (remember it is easiest to create during installation by the distros installer) check out Linux Journal's article.

SMART (or actually S.M.A.R.T) on the other hand means "Self-Monitoring, Analysis, and Reporting Technology", which is a technology most, if not all, modern hard drives support. With SMART enabled drives you can gather information straight from the independent drive(s) and use that information to predict when your disk fails of old age or otherwise. Hard drives have gotten lot better since infamous times of IBM 'Deathstar' hard drives, but still the fact remains: hard drives will die of old age sooner or later. If you can get a advance warning of this impending doom for your drive, you have time to make the necessary preparations, like making backups or even replacing it before you run into risk of losing your data.

For more information check out the excellent article by Linux Journal

Getting started
I am not going to show you how to create a RAID system, it is quite easy to do when you install your operating system, openSUSE installer for example provides nice tool for creating RAID on which to install the system.

Note! You will need super user (root) privileges for (almost) all the steps below.

Step 0
Anyone can start using SMART right away, as it requires neither special hardware or special setup. Some systems might require that you enable the SMART support from your computers BIOS, so check that first. Next step is to make sure you have package called "smartmontools" installed, when you do, you can try running a command like this:

smartctl -a /dev/sda

This should output lot of information about your hard drive, but if you have not enabled SMART testing this information will be of little use. Next you will need to enable SMART testing and data collection to be automatic and done in timely manner.

Step1
First step for this is to edit config file for SMART. Usually it is /etc/smartd.conf. Open it in your favourite text editor and first comment out anything that already exist there, usually this line:

DEVICESCAN -d removable

Now you can add configuration for your drives in the end of the file. Here is example from my workstation which has 4 drives in RAID10:

#Run every Sunday offline, Saturday Long and every evening conveyance and morning short test
/dev/sda \
-H \
-l error -l selftest \
-s (O/../../7/02|L/../../6/02|C/../.././20|S/../.././01) \
-m NotUsedNow -M exec /usr/local/bin/smartd.sh -M once

#Run every Sunday offline, Saturday Long and every evening conveyance and morning short test
/dev/sdb \
-H \
-l error -l selftest \
-s (O/../../7/05|L/../../6/04|C/../.././21|S/../.././02) \
-m NotUsedNow -M exec /usr/local/bin/smartd.sh -M once

#Run every Sunday offline, Saturday Long and every evening conveyance and morning short test
/dev/sdc \
-H \
-l error -l selftest \
-s (O/../../7/08|L/../../6/06|C/../.././22|S/../.././03) \
-m NotUsedNow -M exec /usr/local/bin/smartd.sh -M once

#Run every Sunday offline, Saturday Long and every evening conveyance and morning short test
/dev/sdd \
-H \
-l error -l selftest \
-s (O/../../7/11|L/../../6/08|C/../.././23|S/../.././04) \
-m NotUsedNow -M exec /usr/local/bin/smartd.sh -M once

In above configuration tests that take long time to complete are run during weekends and shorter test are run daily. You can modify the starting time by modifying the last number in the expression.

For example this part "O/../../7/11" translates to "T/MM/DD/d/HH", first letter being the test type (Offline, Long, Conveyance or Short), rest of the fields are for scheduling the test, in this particular case we run Offline test every 7th weekday (Sunday) at 11:00 (AM for those with 12hour time disability).

The second interesting part from the above configuration is the "-M /usr/local/bin/smartd.sh" which basically tells what to do in case of problems, here it is setup to run the script /usr/local/bin/smard.sh which you will have to create in Step2.

Lastly you will need to create the above configuration only for those hard drives you actually have in your system. As mentioned I have four disks (sda, sdb, sdc and sdd) and you might have less. To see your drives you can use the following command:

ls /dev/sd?

Step2

Again with your text editor create a file /usr/local/bin/smartd.sh and paste the following and replace your email to the appropriate place:
LOGFILE="/var/log/smartd.log"
echo -e "$(date)\n$SMARTD_MESSAGE\n" >> "$LOGFILE"
mail myemail.address@somehost.com < $LOGFILE

After you have created the file above, you need to make it executable by running following command:

chmod +x /usr/local/bin/smartd.sh

Extra Step for Ubuntu users
Edit the following file /etc/default/smartmontools and uncomment the following line:

start_smartd=yes

This will make the SMART daemon start automatically during boot.

Step3
Either restart the SMART service or reboot your computer. Restarting the service works like this in openSUSE:

/etc/init.d/smartd restart

Or in Ubuntu:

/etc/init.d/smartmontools restart


That' s it, almost
Now in theory you will get an email if SMART detects problems with you hard drives, but that requires your mail daemon to be in working order. You can test this simply by executing the script we created:

./usr/local/bin/smartd.sh

You should receive email containing the empty smartd.log.

If your email notification is not working, you probably need to enable the mail daemon, but that is quite distro specific and you will need to figure it out by yourself.

Other option is to use somekind of monitor software instead. Ubuntu and Debian users can use Smart-notifier application and KDE users can use Plasmoid called Plasmart.

Now hopefully that will some day save you from disaster of losing your precious files. But in the meantime remember that neither RAID nor SMART replaces making backups, they only safeguard you against hardware failures, you will need traditional backups to safeguard against user, application and operatins system errors!

tiistai 4. toukokuuta 2010

The Dreaded Command-line

or How I learned to Stop Worrying and love the Zypper.


There seems to be certain amount of FUD (Fear, Uncertainty and Doubt) floating around about command-line interfaces (CLI), or simply just about the dreaded "command-line". What most people don't realize is that it is included in every major operating system out there today (yes even your shiny Mac OS X and Windows 7 has it). And it is not just "in there" but it is very important tool for any aspiring power user, for you see, command line enables scripting (more or less) and scripting enables you to do anything!

I do agree that scripting has some learning curve to it and most of the time GUI applications are more intuitive than their command-line brothers, but that's not really the point. You can go about clicking your mouse around the GUI until your forefinger bleeds in any modern operating system, Windows, OS X and most Linux distributions (openSUSE being great example), without ever needing to access CLI if you dont like. But proper CLI enables you to accomplish so much more. This holds especially true in any Unix-related OS like OS X and Linux, the Windows CLI, while getting better slowly, can not hold a candle to them. Many times you can accomplish much more complex tasks with CLI than you would with GUI app and when you introduce some simple bash-scripting you can easily automate them and run them generally faster and more easily next time you need them.

And why all the ranting for the command-line interface?
That's simply because next I am going to show how to update your openSUSE 11.1 installation to openSUSE 11.2 by using command-line tool called zypper. While the process may seem intimidating it is actually quite straightforward and requires only little skill to accomplish. You will only require an active connection to internet, you do not need to burn any images to CD`s or anything of the sort. The same can be achieved also via GUI programs, but in my experience the CLI way works better, at least for me.

The screencast is in two parts because of the maximum length of the Youtube videos is 10 minutes. The process itself does take quite a lot of time, you will first need to update the existing openSUSE 11.1 installation with latest patches and then download over 1 gigabyte worth of updated packages for openSUSE 11.2.


Part 1:


Notice, for maximum readability please watch this video in 720p HD mode in fullscreen.



Part2:


Notice, for maximum readability please watch this video in 720p HD mode in fullscreen.