Deconstructing Windows Azure 1.0

Within four years every issue in this email was resolved except one. And if you squint, you might see the glint of a Snowflake in his eye.


From: Erik Gavriluk <[email protected]>
Sent: Saturday, February 27, 2010 8:00 AM
To: Bob Muglia
Subject: Longer Feedback on Azure 1.0

Background

I have a straightforward web app that requires serving one billion pages daily. These pages are small (most under 1k) and not particularly dynamic (a few million change per day). Structured data storage growth is 5GB year over year. The application started as ASP.NET on IIS6 storing data in PostgreSQL. It took three days to write and was deployed unchanged for nearly three years.

Last year I spent four months rewriting it in Google App Engine. I completed it but did not ship due to lack of confidence in the underlying tech. I spent the last three weeks investigating a rewrite to Azure.

Compared to the three days I spent writing the original version, either I’ve gotten dumber or things have gotten very, very complicated in the cloud.

My Vision for the Cloud

Obviously I’m looking to any cloud provider to outsource as much of the infrastructure and maintenance as possible. Microsoft has done a great job here.

I also want to pay as little as possible for my utility computing. It’s virtually impossible to model costs before coding and deploying, but fortunately I’m a wide-eyed entrepreneur and the scalability promises make it easy to sell myself on most anything. I give Microsoft generally positive marks on pricing with a caveat that much of it didn’t seem to align with my app’s specific needs.

As a developer, I want to spend time on the features of my app. I want to write as little glue code as possible, particularly as it relates to platform-specific scaling issues (this includes data access). Here, the initial Azure release falls a bit short. Moreover, the roadmap is not clear (although Microsoft people seem oddly chatty in whitepapers and forum posts about unannounced features).

Vendor lock-in isn’t a big concern for me. Let’s face it: moving a cloud app is a nightmare. I’ve read articles where people talk about hosting on App Engine for breakfast, moving to Rackspace for Lunch, and deploying on Amazon for Dinner. It just doesn’t work like that. If you have enough data to justify a true cloud coding effort, you have enough data and other dependencies that you’re pretty much stuck. That said, I entered my technical analysis strongly predisposed to both Microsoft products as well as Azure hosting and, strangely enough, it was the lock-in issue that pushed me to Amazon.

I discuss these issues in more detail below.

Pricing

Perhaps utility computing is destined to become a 4% margin business regulated by the government. So I put on my Warren Buffett hat (it doesn’t fit) when analyzing offerings from service providers. I do this in an attempt to figure out where their skills lie. Detailed pricing implies sharp technology which implies future cost control.

A quick model of my application costs indicates that Rackspace is on drugs, Amazon is very complicated, and Microsoft has definitely oversimplified.

I was left feeling that Microsoft’s pricing model is optimized for presentation to corporate customers and less to small business owners or entrepreneurs. I don’t know if this is intentional or a side effect of your organization, but you might want to consider repositioning for other markets. BizSpark for Azure?

In fact, my first impression of Azure overall was confusion on pricing and technology. Maybe I’m guilty of over-thinking it, but costs are a huge consideration not so much for my business as for my software architecture. As a technology-driven business I want my cost structure to align with Microsoft’s technology path and I was willing to bend my development efforts to suit. Much like my aborted Google App Engine experiment, I found few opportunities for synergy with Microsoft tech. I was left to assume key features would appear, along with price cuts and volume/partner tiers. But Azure did not make a strong case regarding any value add of being part of the Microsoft ecosystem. In short, it does not yet feel like a platform.

I know AppFabric solves some of this, but it does so by creating a different, sales-based ecosystem. Here I’m talking the technical side, the pure pleasure of deploying for a platform which comes from the solid feeling of having your technical needs align with the offerings of your platform partner. By way of example, Apple also struggles here. Developing for the iPhone is hideous; the app store, though wildly successful, doesn’t change that.

Regarding specific pricing: the shocker for me was the $9 SQL server. Yes, you got my attention! Of course I have 5GB of data growing at 5GB a year so having the choice of only a 1GB datastore or a 10GB one was unfortunate.

So on to storage. Microsoft has chosen a flat rate of $1/MM for storage transactions. Amazon offers operations which vary from 10x as much ($10/MM for an S3 put) to 10x less (10 cents/MM for EBS). At billions-of-objects scale this matters. Also the lack of availability and pricing on Velocity caching (and my fear that it would be lumped into the “storage” bin) scared me. Since I touch storage billions of times a day I was able to craft clever designs on Amazon that drastically beat Microsoft’s pricing. In short: $200/day versus $2000.

Looking at it another way: Amazon’s pricing implies that it costs more to serve a file than to just load it off disk; likewise it costs more to write and replicate an object than just read it. S3 is in line with this reality and charges 10x the storage cost for a read or 100x the storage cost for a write. This in turn drives application/platform growth along those lines and would seem to create a more sustainable ecosystem as well as provide predictable ROI for their service.

Note that storage costs can be controlled either by app logic (multi-level caching) or externally (via caching proxies). Unfortunately Microsoft doesn’t offer a solution for either at the moment.

And just a quick note on compute costs. A three year reserved lease of Windows Server 2008 on Amazon is $350 upfront plus $36/month = $1646. Microsoft is $87.60/month = $3153. As a developer, where am I likely to leave a test box running while competing technologies ebb and flow? I can run two Amazon EC2 Windows servers behind an Amazon load balancer for just slightly more than the cost of a single Azure server. Again, something seems fishy with the pricing.

Tech – Storage

Although my data is a great fit for a relational database, I successfully jammed it into the key/value model on App Engine. This was really, really hard and frankly I’m not a fan of the model. I also coded it in Python and, as a financial application, the lack of type safety (or even a currency type) was one of many hints that I was using the wrong tool for the job.

Still, I was surprised by what I found with Azure Table Service. Aside from charging more than Google or Amazon, it has fewer features. No indexing, No sort, No kidding? Also, Amazon SimpleDB has a CPU cost component whereas Table Service is fixed cost even though it’s doing these crazy linear scans across unbounded data. The whole thing seems off to me.

Table Service is a quickie implementation of the BigTable paper and ignores all the other work done to make it usable for applications. Google isn’t winning any awards here either (they just got cursors working last week on App Engine) but I wasn’t thrilled with the limitations across the board.

Yes, I know how to solve this today but it creates incredibly-fragile code. Even with the overhead of managing my own keys and duplicating rows (not required with App Engine or Amazon SimpleDB) and therefore doubling my storage costs, I still had performance issues lingering due to full table scans. Then I saw a tech article stating that the key length was limited to MAX_PATH (260 characters) and frankly at this point assumed I was stuck in a bad dream circa 1989 and turned off my computer.

I also find table storage to be terrifying from a backup/snapshot perspective. Amazon, Microsoft and Google all offer nothing. I simply do not understand this. Solve this and I might even be willing to write 15 lines of code instead of 1 to add a row to a table.

Tech – Data Access

I wrote some code to talk to the Table Service and found it, well, overwhelming at worst and inelegant at best. Unlike everyone I know writing scalable apps, I’m a huge fan of type safety and C# and frankly with the right tools this should be really easy. But I’d encourage somebody in Redmond to play with Python and MongoDB for an hour and maybe make a video to show around the company because I think you’ve totally lost the rabbit here.

ADO.NET was okay (it’s how I originally built my app). But LINQ to SQL solved it, really. I simply do not get the Entity Framework stuff and I’m certainly not willing to pay the performance penalty on a cloud computing app. Please do not kill LINQ to SQL.

Frankly I was expecting something like Hibernate in the cloud. As I was hand-crafting classes deriving from classes deriving from sample code in order to do a LINQ insert into an Azure table, I was reminded that the same code path in Hibernate/Java doesn’t even require a class. When Java doesn’t require a class but you require four obviously something has gone amiss.

If you get the architecture right you don’t need to build the Visual Studio wizard to call the generator that creates the classes that call the platform data access layer that calls the library that talks to the REST service and doesn’t crash provided the URL isn’t over MAX_PATH characters. This can be done at the CLR level and I swear David Stutz and I talked about doing this with OFS (aka Hibernate) and VB 15 years ago. I think I even coded it.

I eventually settled on using Amazon RDS (MySQL) via LINQ accessed via Mindscape Lightspeed (a great product, please do not buy them). C#, type safety, even a decimal type! I don’t have to type a lot to read and write rows and Lightspeed even offers an integrated memcache that I’ll try not to use. Hosted on Amazon I get backups for free, snapshot backups at the push of a button, and Amazon will give me a managed server with 68GB of RAM if I need one.

If I’m going to pay a premium for outsourced servers, knowing that I can store my worst-case year-2020 database in RAM on a hosted instance today helps me sleep at night. And oh yeah, backups.

I really don’t think I’m that weird in my data or coding requirements. I’m also definitely not a fan of MySQL. But it just works and frankly I’m just as surprised I arrived on this square as you are.

Tech – Cache

As I stated before getting storage and scalability under control for me involves lots of caching.

When I built the Google App Engine monstrosity, I solved the cache problem by adding thousands of lines of code to manage memcache locally in the app. Many web apps have code littered with this nonsense and it really isn’t a proper solution. The kicker for me was knowing that I had no control over the RAM pool which meant Google could effectively charge me as much as they want by reducing the memory available to my application.

I never want to deal with this kind of code again. But if I have to deal with this kind of code, I’m going to make sure I can tune cache sizes. Whether it’s inside the app or inside the datacenter, I simply require this control before I can ship.

Simplest solution for me is to run a pair of caching proxies (Linux hosted) behind a load balancer (Amazon ELB). Also for the cost of a duplicate write to Azure Storage I can duplicate the object in Amazon S3 as the ultimate failover. This gives me robustness and CDN and even serves the file directly over http for the same price you’re charging me just to add a node to a B-Tree. I don’t get it.

Tech – Missing Pieces

This was longer but I looked at the various Azure UserVoice sites and it seems like people are yelling about mostly the right stuff. My personal list since I’ve got your ear:

  • SQL spatial types
  • A promise never to regress Server functionality on Azure (e.g., SQL spatial types) as this fragments both platforms
  • Reliable email delivery (nobody offers this in the cloud)
  • DNS
  • Caching proxies, load balancers
  • Repeat: there are so many cases where having control over the load balancer and cache layers saves money and increases uptime
  • Snapshot backups for everything: storage, SQL, disk, servers
  • Rollback of service packs: imagine a world where Windows Update even updates PHP for me, now think about making that robust. Scary thought right? Well this has worked in Linux for ten years with enterprise-level support to boot.
  • Something equivalent to Amazon DevPay so I can bill out compute and storage (note that they assume the risk for collecting receivables)
  • Amazing next-generation caching solution (from ETags down to SQL across to tables and storage and blobs, integrating with proxies and load balancers)

Frankly as both a developer and a shareholder I’m less excited by Microsoft’s investment in infrastructure services unless the scalability and systems management R&D pays platform dividends with traditional Microsoft Server tools. I was happy to see it folded into your organization for this reason.

Sure, I want non-Windows servers in the same datacenter and I probably even want slice-style shared VM hosting to run dinky ASP.NET apps and small cron jobs. But I can’t imagine you have much interest in offering that and probably rightly so. But this realization also pushed me toward Amazon! As an agile entrepreneur, I can’t risk another gap like 2005-2009 where most of the emergent technologies didn’t run on Windows Server or talk cleanly to .NET.

One way to solve this (and it solves some of the load balancing/caching issues above as well) would be to arrange a peer bandwidth deal with Amazon. I don’t know if this is possible or practical, I just know I want free bandwidth and low-latency between AWS and Azure. If you can’t do that deal for technology, security or business reasons, then you really need to make the long list of everything that Amazon does well and make sure you can do it too. That’s a 5 year catchup effort. Oh yeah, I want integrated monitoring and reporting, too.

Rants

This section was originally blank but then I remembered I have a reputation to protect. Before I start I’ll again state that I am very impressed with Azure 1.0. And in terms of tools, servers, and language support, boy have you got an amazing thing to build upon.

Punchlist of things that caught my attention:

  • Surprising amount of dead links on MSDN
  • Beta-clutter. Maybe mark your beta forums NOINDEX? Have a problem, get an error, Bing it = thousands of useless results that don’t match the shipping version
  • I reiterate that Table Service is embarrassing
  • .NET data access in general seems awkward and heavyweight compared to everything else on the planet
  • I do NOT like Visual Studio 2010
  • Azure account management and signup is bad. Tiny fonts, weird logins, bad marketing copy, lots of typos
  • Uninstalling Visual Studio I discovered I had four full installations of SQL Server running on my week-old dev machine
  • Speaking of which, how ’bout that SQL Server installer? What have you got, like 30 guys on that? Intuitively I know it’s just copying files, adding some configuration information, and possibly running a stored procedure or two but man they make it look hard and impressive
  • Documentation is rough. Samples are wonky.
  • Easier to talk to Amazon with their C# libraries than to talk to Azure with Microsoft’s libraries!
  • Lack of Velocity and session support was unfortunate for the launch

Only three items deserve a longer treatment. One was downloading Visual Studio installers in .iso format. Given all the compression, packaging and distribution technologies Microsoft has, I was fairly surprised that you picked the one that Windows can’t mount. I fought for this feature when I worked under Rob Glaser in 1992 and I can’t believe Windows 7 still doesn’t do it. The three letters in the extension itself imply the format is reasonably documented and that any licensing issues are containable. What say you Microsoft? Or just don’t use .iso as the only people who use it on the web are the Microsoft Developer Group and Russian pirate movie distribution rings.

The second gripe: deploying an instance and debugging a remote instance on Azure is painful. I’m sure you’re aware, but it takes way too long to spin up a new role. Also remote logging and debugging needs to be improved. I’m building apps with callbacks from other web services (PayPal, Facebook) and the only real solution was to spin up an IIS instance at Amazon to develop on. At that point I decided to talk to Amazon storage then started to question my whole platform and scripting language which really isn’t what you want. Possible solutions here: 1) Allow debugging against at least the staging role 2) Allow me to deploy the Azure dev framework to another server (currently locked out) 3) Provide some new shiny WPF debugging tools that run against Azure that will probably be partially-translucent, slow, and make me grumpy.

The third gripe is broader and relates to the pricing and availability of developer tools. The “Express” editions are fine but I’m going to push you to go farther. Every Unix/Linux box, including my Mac today back to the NeXT workstation sitting on my desk in building 6 in 1994 ships with a full set of developer tools. Apple puts them in the box with the OS alongside their iLife consumer toys. Ubuntu installs them before they even copy over the knockoff Minesweeper game.

Here’s a wild idea: make Visual Studio 2008 Professional free after 2010 ships. I sort of understood the Express versions back when downloads were slow, but that’s the only justification I can come up with now. At a minimum add the resource editor back to Visual C++ Express so people can actually write a Windows app with it (sheesh!), buff everything up with more platform stuff (include the DirectX, Game, Facebook and Azure SDKs), and enable add-in support for third party tools. I also think charging for Expression is a terrible idea; just give it away and be glad anyone is willing to wrestle with it.

There are just so many SKUs and they cost so much money. It really feels out of touch. How many Visual Studio SKUs would you guess there are? Surprise! http://www.cdw.com/shop/search/results.aspx?key=microsoft+visual+studio&searchscope=All&sr=1 shows 1,122 options. I know you guys are making a billion dollars a week but is this really how it’s done?

If Azure and Silverlight (… really?) are the future, then bite the bullet and make the tools free. You can keep Visual Studio Premium and Team System but I’d urge you to rethink the way this all plays out. A quick scan of eBay cross-referenced to Twitter IDs shows various millionaire entrepreneurs buying the Academic version of Visual Studio, and frankly you should be honored they’re going to the trouble to steal it. Meanwhile my new Dell came with a copy of Python which appears to have been pre-loaded with other Google tools.

Ten years ago every software company (er, “ISV”) in my hedge-fundy neighborhood would happily buy Microsoft tools and train people on them. It’s not like that any more. And today’s hobbyists are tomorrow’s entrepreneurs so I think you need to figure out how to let them play a bit with Azure as well. Yeah, this basically means hosting a bunch of wonky web apps for kids for free. But if Google is doing it, you need to respond.

Let me push this to an illogical extreme: I think developer tools should ship with every Dell (uninstalled) and be one click away on the “Turn Windows Features On or Off” box. At first yeah, this sounds silly. But look at some of the other nonsense in there! I’d argue that dev tools have just as wide an audience as half that stuff and they’re 1000x more strategic.

So that’s it. In a month I’ll have my ASP.NET app chugging with a managed MySQL database on Amazon. Front-ended by a couple wonky Linux caching servers, both of which serve some significant chunk of the web’s pages every day, neither of which have much documentation in English.

BUT — I’m one connection string away from talking to Microsoft SQL Server again. Likewise I’m one “yum update” away from hosting everything on a Linux box running Mono if you futz with my LINQ.

Cost, complexity, cache layers, and great backup/snapshot support are the key decision factors for me. A bandwidth arrangement with Amazon would add huge value and provide flexibility while you figure out fancy caching architectures. Please don’t break or regress my dev tools any further and oh yeah, I want them for free!

–Erik


From: Bob Muglia
Sent: Saturday, February 27, 2010 3:10 PM
To: ‘Erik Gavriluk’ (EXTERNAL)
Subject: RE: Longer Feedback on Azure 1.0

Thanks Erik!

This is great feedback.  It will be broadly forwarded.   

Windows Azure is still really new – it feels like NT 3.1 to me.  There is so much left to do.  We’ll definitely address many of the things you gripe about, including pricing.

Thanks again,

bob