Monday, September 21, 2009

Sun Storage 7410

I mentioned earlier that Sun seems to be offering the same old SAN as EMC. This is the case for their 6580 but not for their 7410. I mentioned two problems from the EMC and below is how Sun addresses them.
  • LUN Tetris: The Sun Storage 7410 stripes data across all SATA drives using ZFS to serve blocks in the case of iSCSI or files in the case of NFS. This similar to how NetApp uses WAFL. If you configure ZFS-NSP (no single point of failure) with a minimum of two drawers of disk, then redundant copies are written so you can loose more drives (an entire shelf?).
  • SP Bottle Neck: Sun has the same problem here. One smart SP and several disk only drawers. However, you can max a 7410 to 288T.
Other 7410 features:
  • Contains Solid State Flash drives which ZFS manages to improve performance in areas that need it which is similar to Compellent's automated tiered storage.
  • Does not offer fiber channel access so the only way to read blocks from it is iSCSI.
  • All extra features (snapshots, clones, etc) are included without separate licenses (unlike NetApp)
  • Allows admin to SSH into Solaris based SPs for management OR use a Web Client (Ajax).
  • A virtual box simulator let's you try out the management system.

Saturday, September 19, 2009

NetApp

I mentioned earlier that NetApp seems to be offering the same old SAN as EMC. This is only partially true. I mentioned two problems from the EMC and below is how NetApp addresses them.
  • LUN Tetris: NetApp stripes either SATA or SAS drives across a large RAID6 and aggregates them together so that you have large storage pools of a single type, e.g. one for SATA and one for SAS. You can then compute the amount of available disk by subtracting from a total number. You can also have nothing but SATA and the large stripe can address performance problems associated with SATA, however getting IOPS stats first is a good idea. Else a small pool of SAS could address the need for fast disk.
  • SP Bottle Neck: NetApp has the same problem here. One smart SP and several disk only drawers. They do offer PAM modules which add 256G (depending on which head) of cache to increase performance. When upgrading SPs remember that changing out an SP is easier if you have zoned by port number, not WWN.
A FAS3140 will max out at 420T of raw capacity with all SATA. So SP upgrades seem further away for my organization's current usage. Regardless, this design is not as elegant as grid storage and if I'm going to consider it they'll need to offer a good price.

Tuesday, September 15, 2009

google data liberation front

Is this for real? If so I give Google credit for their data liberation front.

Monday, September 14, 2009

Thin Provisioning

According to IBM's XIV Red Book page 3 Thin Provisioning is "the capability to allocate storage to applications on a just-in-time and as needed basis". The wikipedia has more to say. Storage vendors seem to make this sound so much better than it is. Sure you get more blocks when you need them, the problem is that you now have to get your filesystem to use them. Repeat: you don't just run 'df' before and after and say "oh I grew the LUN, I'm done". How does this behave on *nix file systems?

If you're using ext3 on top of LVM, then it seems to be that you don't even need thin provisioning from your SAN. You could just add a new LUN, add it to the LVM storage group of the volume you want to grow and then ext2online your ext3 volume into it. I've done this a few times and it's worked fine, but it was, as Ben Rockwood said, sucky. Ben's blog mentions how ZFS can make this process less painful: ZFS and Thin Provisioning. Aside from needing to know what you're doing when 'df' lies to you, this looks handy. If my SAN allocates the extra space easily and if ZFS can just pick it up and run with it, then that is good and makes SAN based Thin Provisioning seem worthwhile. Looks like I'm going to have to test this feature with ZFS. I'll post an update as I learn more.

Update: Someone who used to work for XIV told me a little bit of how the thin provisioning system works. If I've understood correctly I take the scenario to be:

  • If a project requires x TB over the course of three years, but only y TB this year, then thin provision x TB such that y TB can be accessed now
  • When you create a file system (this includes ext3) on top of that project's LUN, you will see x TB (even though y TB is what will really be there). Thus, the inode table will be built to access blocks which are not yet there and 'df' will lie to you
  • As long as you have x TB available at some point in the future (perhaps in your total SAN) it will be allocated on demand and the file system won't have a problem.
The benefits I see of doing this are:
  • This can save you money if you're planning to purchase x TB within the next three years, but know that you can only afford y TB today
The problems I can see are:
  • If you don't get those extra disks before the user decides to run out of road (hey, you made the road look longer than it was) then you'll have problems
  • When you reach x TB physically and fill them you are back to the original problem: you will have to use ext2online or some other method to grow the filesystem

Sunday, September 13, 2009

Grid Storage

My organization has managed two EMC Clariion cx3-20s for three years. We have had some problems with its overall design which I'll list below. I'll also list some vendors with the same problem and show some using grid storage to avoid these problems.

EMC problems

LUN Tetris

We have a mix of LUNs of approximately two types:
  • Fast and small
    • Used only by Databases and Email
    • Currently 10T not likely to grow fast
    • 146G 15k FC in RAID10
  • Fast enough and large
    • all other applications (Live VM images, Web roots, Home dirs, File svcs, Archives, etc)
    • Currently 40T grows by about 10T a year
    • 1T 7200 SATA in RAID5
We have many LUNs of one of the two types above and they stripe across a number of disks and if visualized would look like the end of a tetris game with differing colors and shapes. The variation of colors and shapes represent LUNs varying in size, meta-LUNs, RAID types, disk types etc. Some empty space represents unused space that is too small to be of use. When a new project comes and requires some space we analyze our tetris game and consider the best way to accommodate the request.

Service Processor bottleneck

Most SANs have Service Processors (SP), which are computers that run an OS: EMC runs Flare, NetApp runs Data on Tap (BSD derivative), etc. The SPs can be thought of as servers which pass block change operations from clients to the block devices which are connected directly to them. In EMC's case, this connection is implemented as daisy chained copper SCSI cables to several drawers of disks. The cx3-20 can hold eight drawers. We want to add an extra drawer this year, but we will have an additional cost to upgrade to a cx3-40, which is basically a new SP which can hold 16 drawers. So, every few years you must upgrade the SP. In our case EMC wants us to buy a cx4 instead of a cx3.

Same old SAN

I'm probably over simplifying the comparison of the products listed below, but since they have the same problems which I listed above, to me they look like the same old SAN. I'm going to speak with sales reps for each of the companies above to let them tell me about some other product that they offer so that I might update this page and list them as offering Grid Storage.

Grid Storage

There are new grid storage based systems which don't have these problems. The basic idea is that rather than have a smart SP and several dumb drawers, each drawer is smart and also known as a node. In IBM's XIV each node is an individual server made from commodity parts: 1 quad-core intel, 8G of RAM, 12 1T SATA drives and a stripped down Linux-based OS. These servers are then networked to speak which each other via 10G ethernet instead of daisy chained SCSI. Each portion of data that is written to any particular LUN is split across all of the disks and the large stripe helps the SATA perform as well as the fast disk. Redundant portions are also written so that one can loose up to three nodes in a six node system. Relative to the problems posed in the beginning we have:
  • LUN Tetris: The only property of an XIV LUN is size. Every LUN has the same speed which is fast. Every LUN is made of commodity SATA. Keep a tally of the total size and subtract the requested size for a new project.
  • SP Bottleneck: Since each node, or drawer, is an SP storage and processing scale at the same rate. There is no sudden need to upgrade the SP during an expansion
I am trying to build a list of vendors which use grid storage to serve block devices (IBM was the first vendor I found doing exactly this, so my description above is biased towards them). NEC's HydraStor and Isilon use grid storage except they are serving NFS volumes. Please post a comment if you know of storage vendors doing something similar.

Tuesday, September 8, 2009

cleversafe.org

Cleversafe has a Dispersed Storage system.

Thursday, September 3, 2009

Searching for Storage

I'm getting started on rethinking how to deal with storage and cheaper ways to deal with what I already have. Some things on my mind include:
  • Black Blaze has a nice article on Petabytes on a budget.
  • IBM is going to tell me how great their XIV SAN is. We'll see.
  • EMC support contracts are too expensive. BL Trading supposedly does it for less. We'll see.