“Unrecognized option: -files” in hadoop streaming job

I was recently working on an Elastic MapReduce Streaming setup, that required copying a few required Python files to the nodes in addition to the mapper/reducer.

After much trial & error, I ending up using the following .NET AWS SDK code to accomplish the file upload:

var mapReduce = new StreamingStep {
    Inputs = new List<string> { "s3://<bucket>/input.txt" },
    Output = "s3://<bucket>/output/",
    Mapper = "s3://<bucket>/mapper.py",
    Reducer = "s3://<bucket>/reducer.py",


var step = new StepConfig {
    Name = "python_mapreduce",
    ActionOnFailure = "TERMINATE_JOB_FLOW",
    HadoopJarStep = mapReduce

// Then build & submit the RunJobFlowRequest

This generated the rather odd error:

ERROR org.apache.hadoop.streaming.StreamJob (main): Unrecognized option: -files

Odd, because -files most certainly is an option.

Prolonged googling later, and I discovered that the -files option needs to come first. However, StreamingStep doesn’t give me any way to change the order of the arguments – or does it?

I eventually realised I was being a bit dense. ToHadoopJarStepConfig() is a convenience method that just generates a regular JarStep… which exposes the args as a List. Change the code to this:

mapReduce.Args.Insert(0, "-files");
mapReduce.Args.Insert(1, "s3://<bucket>/python_module_1.py,s3://<bucket>/python_module_2.py");

and everything is awesome.

Basic Auth with a Web API 2 IAuthenticationFilter

MVC5/Web API 2 introduced a new IAuthenticationFilter (as opposed the the IAuthorizationFilter we needed to dual-purpose in the past), as well as a substantial overhaul of the user model with ASP.NET Identity. Unfortunately, the documentation is abysmal, and all the blog articles focus on the System.Web.Mvc.Filters.IAuthenticationFilter, not the System.Web.Http.Filters.IAuthenticationFilter, which is clearly something entirely different.

We had a project where we needed to support a Basic-over-SSL authentication scheme on the ApiControllers for a mobile client, as well as Forms auth for the MVC controllers running the admin interface. We were keen to leverage the new Identity model, mostly as it appears to be a much more coherent design than the legacy hodgepodge we’d used previously. This required a fair bit of decompilation and digging, but I eventually came up with something that worked.

Below is an excerpt of the relevant parts of our BasicAuthFilter class – it authenticates against a UserManager<T> (which could be the default EF version) and creates a (role-less) ClaimsPrincipal if successful.

public async Task AuthenticateAsync(HttpAuthenticationContext context, CancellationToken cancellationToken)
    var authHeader = context.Request.Headers.Authorization;
    if (authHeader == null || authHeader.Scheme != "Basic")
        context.ErrorResult = Unauthorized(context.Request);
        string[] credentials = ASCIIEncoding.ASCII.GetString(Convert.FromBase64String(authHeader.Parameter)).Split(':');

        if (credentials.Length == 2)
            using (var userManager = CreateUserManager())
                var user = await userManager.FindAsync(credentials[0], credentials[1]);
                if (user != null)
                    var identity = await userManager.CreateIdentityAsync(user, "BasicAuth");
                    context.Principal = new ClaimsPrincipal(new ClaimsIdentity[] { identity });
                    context.ErrorResult = Unauthorized(context.Request);
            context.ErrorResult = Unauthorized(context.Request);

public Task ChallengeAsync(HttpAuthenticationChallengeContext context, CancellationToken cancellationToken)
    context.Result = new AddBasicChallengeResult(context.Result, realm);
    return Task.FromResult(0);

private class AddBasicChallengeResult : IHttpActionResult
    private IHttpActionResult innerResult;
    private string realm;

    public AddBasicChallengeResult(IHttpActionResult innerResult, string realm)
        this.innerResult = innerResult;
        this.realm = realm;

    public async Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)
        var response = await innerResult.ExecuteAsync(cancellationToken);
        if (response.StatusCode == HttpStatusCode.Unauthorized)
            response.Headers.WwwAuthenticate.Add(new AuthenticationHeaderValue("Basic", String.Format("realm=\"{0}\"", realm)));
        return response;

Note that you’ll need to use config.SuppressDefaultHostAuthentication() in your WebApiConfig in order to prevent redirection from unauthorised API calls.

Build Server Traffic Lights

Traffic Light

I’ve wanted real build server traffic lights since I first implemented a Continuous Integration server in the mid 2000s. In those days, the trendy thing to do was to hook up red & green lava lamps to your build server, but CCTray’s red/green/yellow status indicators always seemed better suited to traffic lights. However, it was something that always got put in the ‘someday’ pile. More recently, I’d become interested in hardware automation platforms like Arduino, and it seemed like an ideal first project, so I dusted off the concept.

Obtaining the traffic light unit itself was relatively straightforward – in WA, the old style incandescent units are being progressively replaced with LEDs, so the reasoning was there’d be some that are superfluous to requirements. A few phone calls later, I managed to track down the contractor handling the replacement and do a beverage-related deal for a second-hand traffic light. The hardest part was actually explaining what I intended to do with it!

These traffic light units don’t contain any switching logic or complex electronics at all – they have a 240VAC feed for each light, with industrial grade internal transformers stepping down to 10V and driving 20W high-pressure bulbs. I’d seen reports that the standard bulbs were too bright for indoor use, but a test run showed it was probably just okay, and it was certainly much simpler to keep the lighting as-is while I got the rest of the hardware working.

Traffic Light 3The intention was to run the lights as a networked device (rather than a USB one, requiring an active host computer), as this would enable more flexibility in installation. I ordered an Arduino Ethernet and relay shield from Little Bird Electronics, and set about coding the controller software.

Traffic Light UI

The code is available online here – it’s adapted from a similar project by Dirk Engels. The Arduino runs a web server that serves a page displaying the current status of the light, as well as buttons to control the light and RESTful control URLs to provide build server integration. My main changes to the design were:

  • Integration of a DHCP library, to remove the hard-coded IP address and make it possible to move the light between networks without reprogramming.
  • Bonjour support, to advertise the light at ‘traffic-light.local’ and remove any requirement for DNS entries/DHCP reservations on the network.
  • A failover mode that flashes amber if the light has not heard from the build server in over 5 minutes. This mimics real world behaviour and seemed more appropriate than turning off or displaying the last known state indefinitely.

Traffic Light 2

Wiring in the controller was pretty simple – the 240V mains feed powers the 9V DC power supply for the Arduino, as well as the 10V transformers for the lights via the relay shield. Initially these were switched on the high-voltage side, but the inrush current appeared to play havoc with small switch-mode power supplies (i.e. phone chargers) on the same circuit, so I rewired to switch on the low-voltage side. This also allowed me to remove two of the transformers and freed up some internal space; I ended up being able to neatly mount the controller on one of the unused transformer brackets.

Traffic Light 4Obviously the light needed a pole; I constructed one using galvanised fence post and some sub-par oxy welding. I would have liked to run the wiring down inside the pole, but unfortunately the size of the mains plug was going to make this difficult (given I wanted the light to stay easily removable). A few coats of suitable yellow paint and it was good to go.

After installing the light in the office, we developed a small powershell script to query the build server and update the light. It’s had a significant benefit in putting the build status unavoidably in front of the developers, and the builds have become noticeably more ‘green’ than they have been for some time.

There are a few areas I’d design differently if I did it again:

  • Use a hardware flasher circuit for the failover mode (via the fourth relay) – the software flasher works okay, but there’s a noticeable stutter in the flashes if the controller is doing something else (like responding to a web request). I’m not enough of a hardware whiz to build one of these though.
  • Install bulkhead RJ45 & 3-pin PC power connections on the traffic light housing, so that the cables are detachable – this would permit variable cable lengths and potentially allow routing inside the pole.
  • Use low-wattage bulbs rather than the specialised 20W high pressure bulbs – the traffic light is a bit bright straight-on. Unfortunately the existing bulb holders have a unique bayonet mount and they’d need to be replaced with something else (e.g. automotive BA15S).

EntityPropertyMappingAttribute duplicated between assemblies

I was working on an entity class for an OData endpoint when I ran across the following doozy:

The type ‘System.Data.Services.Common.EntityPropertyMappingAttribute’ exists in both ‘…Microsoft.Data.OData.dll’ and ‘…System.Data.Services.Client.dll’

It looks like Microsoft has duplicated this type (plus a couple of others) between two different assemblies – in this instance I ran across it with the Azure.Storage package.

Thankfully, Jon Skeet to the rescue! To resolve:

  1. Select the System.Data.Services.Client reference and open the properties dialog
  2. Under ‘Aliases’, change ‘global’ to ‘global,SystemDataServicesClient’
  3. Add the following code at the top of the offending entity file:
extern alias SystemDataServicesClient;
using SystemDataServicesClient::System.Data.Services.Common;

You’ll also need to delete your other using System.Data.Services.Common, but at that point you should be compiling again.

Azure AD Single Sign On with multiple environments (Reply URLs)

As part of an effort to move some internal applications to the cloud (sorry, The Cloud™), I recently went through the process of implementing Azure AD single sign on against our Office365 tenant directory. Working through the excellent MSDN tutorial, I hit the following (where it was describing how to reconfigure Azure AD to deploy your app to production):

Locate the REPLY URL text box, and enter there the address of your target Windows Azure Web Site (for example, https://aadga.windowsazure.net/). That will let Windows Azure AD to return tokens to your Windows Azure Web Site location upon successful authentication (as opposed to the development time location you used earlier in the thread). Once you updated the value, hit SAVE in the command bar at the bottom of the screen.

Wait, what? This appears to imply  Azure AD can’t authenticate an application in more than one environment (eg if you want to run a production & test environment, or, I don’t know, RUN IT LOCALLY) without setting up duplicate Azure applications and making fairly extensive changes to the web.config. Surely there’s a better way?

I noticed that the current version of the Azure management console allows for multiple Reply URL values:
Azure AD Reply URLs

However, just adding another URL didn’t work – the authentication still only redirected to the topmost value.

The key was the \\system.identityModel.services\federationConfiguration\wsFederation@reply attribute in web.config – adding this attribute sent through the reply URL and allowed authentication via the same Azure AD application from multiple environments, with only relatively minor web.config changes.

As the simplest solution, here’s an example Web.Release.config transform – more advanced scenarios could involve scripting xml edits during a build step to automatically configure by environment.

      <wsFederation reply="<<your prod url>>" xdt:Transform="SetAttributes" />

Testing a SignalR application using WebDriver in IE9

I was having problems testing a SignalR application in IE9 using the Selenium IEServerDriver – the WebDriver instance would block indefinitely after navigating to a page that starts a hub connection.

SignalR Issue 293 seems to imply that the IE WebDriver is not compatible with foreverFrame – it never recognises that the page has finished loading. Changing the hub connection code to:

$.connection.hub.start({ transport: ['longPolling', 'webSockets'] });

fixed the issue and made IE9 testable again via WebDriver.


We were recently trying to build basic unit tests for the controller actions on an MVC4 + RavenDB application, and having problems attempting to mock the IDocumentSession. The RavenDB people consistently say not to do that, and that running an EmbeddedDocumentStore solves every unit test problem under the sun and then makes you breakfast. However, we tried it and weren’t really happy with the code-to-value ratio.

The process is typically:

  1. Create an EmbeddedDocumentStore
  2. Create all your indexes
  3. Create a session, load your test document set, and save changes
  4. Wait for indexing to complete (eg by registering a custom IQueryListener that modifies all queries to wait for non-stale results)
  5. Create & inject your session
  6. Run your tests

This approach requires a lot of setup, the tests are slow, and the package dependencies on your test project are considerable, where all we really wanted to accomplish was to return a specific result set in response to a specific method call on the IDocumentSession.

The most immediate problem you run into when mocking IDocumentSession is returning a useable IRavenQueryable<T>. In case anyone else is brave enough to risk the scorn of Ayende, below is my implementation of a FakeRavenQueryable<T> class that wraps a generic IQueryable<T>:

public class FakeRavenQueryable<T> : IRavenQueryable<T>
        private IQueryable<T> source;

        public RavenQueryStatistics QueryStatistics { get; set; }

        public FakeRavenQueryable(IQueryable<T> source, RavenQueryStatistics stats = null)
            this.source = source;
            QueryStatistics = stats;

        public IRavenQueryable<T> Customize(Action<Raven.Client.IDocumentQueryCustomization> action)
            return this;

        public IRavenQueryable<T> Statistics(out RavenQueryStatistics stats)
            stats = QueryStatistics;
            return this;

        public IEnumerator<T> GetEnumerator()
            return source.GetEnumerator();

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            return source.GetEnumerator();

        public Type ElementType
            get { return typeof(T); }

        public System.Linq.Expressions.Expression Expression
            get { return source.Expression; }

        public IQueryProvider Provider
            get { return new FakeRavenQueryProvider(source, QueryStatistics); }

    public class FakeRavenQueryProvider : IQueryProvider
        private IQueryable source;
        private RavenQueryStatistics stats;

        public FakeRavenQueryProvider(IQueryable source, RavenQueryStatistics stats = null)
            this.source = source;
            this.stats = stats;

        public IQueryable<TElement> CreateQuery<TElement>(System.Linq.Expressions.Expression expression)
            return new FakeRavenQueryable<TElement>(source.Provider.CreateQuery<TElement>(expression), stats);

        public IQueryable CreateQuery(System.Linq.Expressions.Expression expression)

            var type = typeof(FakeRavenQueryable<>).MakeGenericType(expression.Type);
            return (IQueryable)Activator.CreateInstance(type, source.Provider.CreateQuery(expression), stats);

        public TResult Execute<TResult>(System.Linq.Expressions.Expression expression)
            return source.Provider.Execute<TResult>(expression);

        public object Execute(System.Linq.Expressions.Expression expression)
            return source.Provider.Execute(expression);

It can be returned from a mocked IDocumentSession using code like the following (using Moq in this case):

    Mock<IDocumentSession> session = new Mock<IDocumentSession>();
    session.Setup(s => s.Query<Product>()).Returns(
        new FakeRavenQueryable<Product>(
            new RavenQueryStatistics { TotalResults = 100 }
    // inject your session & run your tests here
    RavenQueryStatistics stats;
    var results = session.Object.Query<Product>().Where(p => p.Id == "Products/1")
        .Statistics(out stats)

It won’t make you breakfast, give you full access to the advanced session methods, or tell you if you’re passing unsupported expressions, but it will allow you to mock out simple queries in your application, and the Linq methods all work over your test list as you’d expect (even In!)

Using NuGet packages in Visual Studio 2012 project templates

Anyone creating Visual Studio project templates these days should be using NuGet package references for project dependencies — it saves you having to update your templates every time a dependency changes, and follows a standard & generally accepted binary dependancy approach.

If you’re distributing your project template as a VSIX package (strongly recommended) the NuGet docs here specify the preferred approach is to include the .nupkg files within the VSIX, but the instructions specify the addition of a ‘CustomExtension’ element to the .vsixmanifest file that is no longer valid in v2 of the VSIX schema (the default in Visual Studio 2012). I spent a considerable period of time attempting to work out what the v2 equivalent of CustomExtension was, but to cut a long story short, you don’t need to make any changes to the .vsixmanifest — it’s enough to include all of the packages in the VSIX under a ‘Packages’ directory.

Following are the steps I used to create a working 2012 VSIX project template:

  1. Download and install the Visual Studio 2012 SDK, if you haven’t done so already.
  2. Create a solution and add a ‘C# Project Template’ project and a ‘VSIX Project’ project (under ‘Visual C# > Extensibility’). It amuses me no end that there’s a project template project template. I am easily amused though.
  3. Set up your project template the way you want it. I find it easier to create a temporary project, use ‘File > Export Template…’ and then unzip the package and copy the relevant bits across.
  4. Update your .vstemplate with the ‘WizardExtension’ and ‘WizardData’ elements like the following:
        <Assembly>NuGet.VisualStudio.Interop, Version=, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a</Assembly>
        <packages repository="extension" repositoryId="<!-- your VSIX Product Id -->">
          <!-- your NuGet package dependencies go here, eg: -->
          <package id="Moq" version="4.0.10827" />
          <package id="xunit" version="1.9.1" />
          <package id="xunit.extensions" version="1.9.1" />
  5. Double-click on the source.extension.vsixmanifest to open it in the designer, and add a new Asset. Select ‘Microsoft.VisualStudio.ProjectTemplate’ as the type, “a project in the current solution” as the source, and then choose your template project. See this blog post for more info about this approach.
  6. Create a folder under your VSIX project called ‘Packages’ and drag the .nupkg files (referenced in the .vstemplate above) into this folder. Set the Build Type on the files to ‘Content’ and ‘Include in VSIX’ to ‘True’.
  7. Build the solution – it will produce a VSIX that includes the NuGet dependencies. You’re done!

It’s more correct to also add the NuGet Package Manager extension as a dependency to your VSIX, although I don’t usually bother for our internal templates.

One of the great things about VSIX distribution is that you can add multiple project & item templates to a single package as a discrete, easily distributable approach (just add more assets to the vsixmanifest).

Office365 password reset doesn’t work on Telstra phones

We’ve had persistent problems getting password reset SMSes via the Office365 administrator password reset functionality. Some to-ing & fro-ing with Microsoft support has eventually uncovered that their SMSes are being rejected by Telstra due to an ‘invalid format’.

They claim to be looking into it, but in the meantime if you’re an Office365 administrator and have a Telstra mobile, make sure you have a backup administrator account.

Replacing Twitter – easier said than done

The normally unflappable Brent Simmons suggests we replace twitter with ‘nothing’. This is an interesting yet logistically improbable approach — I’ve tried to enumerate the challenges here, along with the reasons why I think it won’t happen.

Brent’s suggestion revolves around replacing twitter status updates with RSS feeds, extending this with a published ‘following’ list, and relying on a nascent third party service(s) for searching & mentions.

The challenges:

  • RSS hosting: your feed and following list need to be hosted somewhere publicly accessible. Geeks can easily sort out hosting, ‘normal people’ are going to have to rely on blogging platforms and other services.
  • Identity: twitter enforces unique usernames (and restricts wilful impersonation); the best you can hope for with RSS is a unique URL. Impersonation is likely to be an issue.
  • API: posting twitter updates from third party applications involves a single API endpoint, a consistent authentication mechanism, and a clearly documented posting API. Distributed feed hosting will make it difficult for third-party applications to post.
  • Front end: I suspect a single, canonical, user-friendly web interface is more important than is immediately obvious. Certainly the service would struggle to establish acceptance and mindshare without one.
  • Searching & mentions: A complete solution is effectively going to involve indexing every RSS feed on the internet, in real time. This will beget considerable storage and bandwidth costs, let alone the engineering expertise required to build it.
  • Governance: search, mention and other interactions involve consistent application of mutually agreed logic. In a distributed scenario, a working group or standards body would need to publish & enforce the protocol.

None of these challenges are insurmountable, but I doubt if any can be properly overcome without the resources of a well-funded (i.e. commercial) organisation. Indexing/searching in particular is going to be expensive. A successful service is going to provide hosting, identity, a good API, value-adding development etc — which will need to be funded somehow — at which point you have another twitter.

My belief is that twitter as it currently exists is the result of innumerable obvious & non-obvious market forces, and despite many people desiring otherwise, it’s the optimum solution based on the current circumstances. That’s not to say it can’t be disrupted by a different model in the future (and it’s certainly incumbent upon us to explore this, as Brent has, in the hope we’ll find it), but I don’t think this is it.