“Unrecognized option: -files” in hadoop streaming job

I was recently working on an Elastic MapReduce Streaming setup, that required copying a few required Python files to the nodes in addition to the mapper/reducer.

After much trial & error, I ending up using the following .NET AWS SDK code to accomplish the file upload:

var mapReduce = new StreamingStep {
    Inputs = new List<string> { "s3://<bucket>/input.txt" },
    Output = "s3://<bucket>/output/",
    Mapper = "s3://<bucket>/mapper.py",
    Reducer = "s3://<bucket>/reducer.py",
}.ToHadoopJarStepConfig();

mapReduce.Args.Add("-files");
mapReduce.Args.Add("s3://<bucket>/python_module_1.py,s3://<bucket>/python_module_2.py");

var step = new StepConfig {
    Name = "python_mapreduce",
    ActionOnFailure = "TERMINATE_JOB_FLOW",
    HadoopJarStep = mapReduce
};

// Then build & submit the RunJobFlowRequest

This generated the rather odd error:

ERROR org.apache.hadoop.streaming.StreamJob (main): Unrecognized option: -files

Odd, because -files most certainly is an option.

Prolonged googling later, and I discovered that the -files option needs to come first. However, StreamingStep doesn’t give me any way to change the order of the arguments – or does it?

I eventually realised I was being a bit dense. ToHadoopJarStepConfig() is a convenience method that just generates a regular JarStep… which exposes the args as a List. Change the code to this:

mapReduce.Args.Insert(0, "-files");
mapReduce.Args.Insert(1, "s3://<bucket>/python_module_1.py,s3://<bucket>/python_module_2.py");

and everything is awesome.

Basic Auth with a Web API 2 IAuthenticationFilter

MVC5/Web API 2 introduced a new IAuthenticationFilter (as opposed the the IAuthorizationFilter we needed to dual-purpose in the past), as well as a substantial overhaul of the user model with ASP.NET Identity. Unfortunately, the documentation is abysmal, and all the blog articles focus on the System.Web.Mvc.Filters.IAuthenticationFilter, not the System.Web.Http.Filters.IAuthenticationFilter, which is clearly something entirely different.

We had a project where we needed to support a Basic-over-SSL authentication scheme on the ApiControllers for a mobile client, as well as Forms auth for the MVC controllers running the admin interface. We were keen to leverage the new Identity model, mostly as it appears to be a much more coherent design than the legacy hodgepodge we’d used previously. This required a fair bit of decompilation and digging, but I eventually came up with something that worked.

Below is an excerpt of the relevant parts of our BasicAuthFilter class – it authenticates against a UserManager<T> (which could be the default EF version) and creates a (role-less) ClaimsPrincipal if successful.

public async Task AuthenticateAsync(HttpAuthenticationContext context, CancellationToken cancellationToken)
{
    var authHeader = context.Request.Headers.Authorization;
    if (authHeader == null || authHeader.Scheme != "Basic")
        context.ErrorResult = Unauthorized(context.Request);
    else
    {
        string[] credentials = ASCIIEncoding.ASCII.GetString(Convert.FromBase64String(authHeader.Parameter)).Split(':');

        if (credentials.Length == 2)
        {
            using (var userManager = CreateUserManager())
            {
                var user = await userManager.FindAsync(credentials[0], credentials[1]);
                if (user != null)
                {
                    var identity = await userManager.CreateIdentityAsync(user, "BasicAuth");
                    context.Principal = new ClaimsPrincipal(new ClaimsIdentity[] { identity });
                }
                else
                    context.ErrorResult = Unauthorized(context.Request);
            }
        }
        else
            context.ErrorResult = Unauthorized(context.Request);
    }
}

public Task ChallengeAsync(HttpAuthenticationChallengeContext context, CancellationToken cancellationToken)
{
    context.Result = new AddBasicChallengeResult(context.Result, realm);
    return Task.FromResult(0);
}

private class AddBasicChallengeResult : IHttpActionResult
{
    private IHttpActionResult innerResult;
    private string realm;

    public AddBasicChallengeResult(IHttpActionResult innerResult, string realm)
    {
        this.innerResult = innerResult;
        this.realm = realm;
    }

    public async Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)
    {
        var response = await innerResult.ExecuteAsync(cancellationToken);
        if (response.StatusCode == HttpStatusCode.Unauthorized)
            response.Headers.WwwAuthenticate.Add(new AuthenticationHeaderValue("Basic", String.Format("realm=\"{0}\"", realm)));
        return response;
    }
} 

Note that you’ll need to use config.SuppressDefaultHostAuthentication() in your WebApiConfig in order to prevent redirection from unauthorised API calls.

Build Server Traffic Lights

Traffic Light

I’ve wanted real build server traffic lights since I first implemented a Continuous Integration server in the mid 2000s. In those days, the trendy thing to do was to hook up red & green lava lamps to your build server, but CCTray’s red/green/yellow status indicators always seemed better suited to traffic lights. However, it was something that always got put in the ‘someday’ pile. More recently, I’d become interested in hardware automation platforms like Arduino, and it seemed like an ideal first project, so I dusted off the concept.

Obtaining the traffic light unit itself was relatively straightforward – in WA, the old style incandescent units are being progressively replaced with LEDs, so the reasoning was there’d be some that are superfluous to requirements. A few phone calls later, I managed to track down the contractor handling the replacement and do a beverage-related deal for a second-hand traffic light. The hardest part was actually explaining what I intended to do with it!

These traffic light units don’t contain any switching logic or complex electronics at all – they have a 240VAC feed for each light, with industrial grade internal transformers stepping down to 10V and driving 20W high-pressure bulbs. I’d seen reports that the standard bulbs were too bright for indoor use, but a test run showed it was probably just okay, and it was certainly much simpler to keep the lighting as-is while I got the rest of the hardware working.

Traffic Light 3The intention was to run the lights as a networked device (rather than a USB one, requiring an active host computer), as this would enable more flexibility in installation. I ordered an Arduino Ethernet and relay shield from Little Bird Electronics, and set about coding the controller software.

Traffic Light UI

The code is available online here – it’s adapted from a similar project by Dirk Engels. The Arduino runs a web server that serves a page displaying the current status of the light, as well as buttons to control the light and RESTful control URLs to provide build server integration. My main changes to the design were:

  • Integration of a DHCP library, to remove the hard-coded IP address and make it possible to move the light between networks without reprogramming.
  • Bonjour support, to advertise the light at ‘traffic-light.local’ and remove any requirement for DNS entries/DHCP reservations on the network.
  • A failover mode that flashes amber if the light has not heard from the build server in over 5 minutes. This mimics real world behaviour and seemed more appropriate than turning off or displaying the last known state indefinitely.

Traffic Light 2

Wiring in the controller was pretty simple – the 240V mains feed powers the 9V DC power supply for the Arduino, as well as the 10V transformers for the lights via the relay shield. Initially these were switched on the high-voltage side, but the inrush current appeared to play havoc with small switch-mode power supplies (i.e. phone chargers) on the same circuit, so I rewired to switch on the low-voltage side. This also allowed me to remove two of the transformers and freed up some internal space; I ended up being able to neatly mount the controller on one of the unused transformer brackets.

Traffic Light 4Obviously the light needed a pole; I constructed one using galvanised fence post and some sub-par oxy welding. I would have liked to run the wiring down inside the pole, but unfortunately the size of the mains plug was going to make this difficult (given I wanted the light to stay easily removable). A few coats of suitable yellow paint and it was good to go.

After installing the light in the office, we developed a small powershell script to query the build server and update the light. It’s had a significant benefit in putting the build status unavoidably in front of the developers, and the builds have become noticeably more ‘green’ than they have been for some time.

There are a few areas I’d design differently if I did it again:

  • Use a hardware flasher circuit for the failover mode (via the fourth relay) – the software flasher works okay, but there’s a noticeable stutter in the flashes if the controller is doing something else (like responding to a web request). I’m not enough of a hardware whiz to build one of these though.
  • Install bulkhead RJ45 & 3-pin PC power connections on the traffic light housing, so that the cables are detachable – this would permit variable cable lengths and potentially allow routing inside the pole.
  • Use low-wattage bulbs rather than the specialised 20W high pressure bulbs – the traffic light is a bit bright straight-on. Unfortunately the existing bulb holders have a unique bayonet mount and they’d need to be replaced with something else (e.g. automotive BA15S).

EntityPropertyMappingAttribute duplicated between assemblies

I was working on an entity class for an OData endpoint when I ran across the following doozy:

The type ‘System.Data.Services.Common.EntityPropertyMappingAttribute’ exists in both ‘…Microsoft.Data.OData.dll’ and ‘…System.Data.Services.Client.dll’

It looks like Microsoft has duplicated this type (plus a couple of others) between two different assemblies – in this instance I ran across it with the Azure.Storage package.

Thankfully, Jon Skeet to the rescue! To resolve:

  1. Select the System.Data.Services.Client reference and open the properties dialog
  2. Under ‘Aliases’, change ‘global’ to ‘global,SystemDataServicesClient’
  3. Add the following code at the top of the offending entity file:
extern alias SystemDataServicesClient;
using SystemDataServicesClient::System.Data.Services.Common;

You’ll also need to delete your other using System.Data.Services.Common, but at that point you should be compiling again.

Azure AD Single Sign On with multiple environments (Reply URLs)

As part of an effort to move some internal applications to the cloud (sorry, The Cloud™), I recently went through the process of implementing Azure AD single sign on against our Office365 tenant directory. Working through the excellent MSDN tutorial, I hit the following (where it was describing how to reconfigure Azure AD to deploy your app to production):

Locate the REPLY URL text box, and enter there the address of your target Windows Azure Web Site (for example, https://aadga.windowsazure.net/). That will let Windows Azure AD to return tokens to your Windows Azure Web Site location upon successful authentication (as opposed to the development time location you used earlier in the thread). Once you updated the value, hit SAVE in the command bar at the bottom of the screen.

Wait, what? This appears to imply  Azure AD can’t authenticate an application in more than one environment (eg if you want to run a production & test environment, or, I don’t know, RUN IT LOCALLY) without setting up duplicate Azure applications and making fairly extensive changes to the web.config. Surely there’s a better way?

I noticed that the current version of the Azure management console allows for multiple Reply URL values:
Azure AD Reply URLs

However, just adding another URL didn’t work – the authentication still only redirected to the topmost value.

The key was the \\system.identityModel.services\federationConfiguration\wsFederation@reply attribute in web.config – adding this attribute sent through the reply URL and allowed authentication via the same Azure AD application from multiple environments, with only relatively minor web.config changes.

As the simplest solution, here’s an example Web.Release.config transform – more advanced scenarios could involve scripting xml edits during a build step to automatically configure by environment.

 <system.identityModel.services>
    <federationConfiguration>
      <wsFederation reply="<<your prod url>>" xdt:Transform="SetAttributes" />
    </federationConfiguration>
  </system.identityModel.services>

Testing a SignalR application using WebDriver in IE9

I was having problems testing a SignalR application in IE9 using the Selenium IEServerDriver – the WebDriver instance would block indefinitely after navigating to a page that starts a hub connection.

SignalR Issue 293 seems to imply that the IE WebDriver is not compatible with foreverFrame – it never recognises that the page has finished loading. Changing the hub connection code to:


$.connection.hub.start({ transport: ['longPolling', 'webSockets'] });

fixed the issue and made IE9 testable again via WebDriver.

FakeRavenQueryable<T>

We were recently trying to build basic unit tests for the controller actions on an MVC4 + RavenDB application, and having problems attempting to mock the IDocumentSession. The RavenDB people consistently say not to do that, and that running an EmbeddedDocumentStore solves every unit test problem under the sun and then makes you breakfast. However, we tried it and weren’t really happy with the code-to-value ratio.

The process is typically:

  1. Create an EmbeddedDocumentStore
  2. Create all your indexes
  3. Create a session, load your test document set, and save changes
  4. Wait for indexing to complete (eg by registering a custom IQueryListener that modifies all queries to wait for non-stale results)
  5. Create & inject your session
  6. Run your tests

This approach requires a lot of setup, the tests are slow, and the package dependencies on your test project are considerable, where all we really wanted to accomplish was to return a specific result set in response to a specific method call on the IDocumentSession.

The most immediate problem you run into when mocking IDocumentSession is returning a useable IRavenQueryable<T>. In case anyone else is brave enough to risk the scorn of Ayende, below is my implementation of a FakeRavenQueryable<T> class that wraps a generic IQueryable<T>:

public class FakeRavenQueryable<T> : IRavenQueryable<T>
    {
        private IQueryable<T> source;

        public RavenQueryStatistics QueryStatistics { get; set; }

        public FakeRavenQueryable(IQueryable<T> source, RavenQueryStatistics stats = null)
        {
            this.source = source;
            QueryStatistics = stats;
        }

        public IRavenQueryable<T> Customize(Action<Raven.Client.IDocumentQueryCustomization> action)
        {
            return this;
        }

        public IRavenQueryable<T> Statistics(out RavenQueryStatistics stats)
        {
            stats = QueryStatistics;
            return this;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return source.GetEnumerator();
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return source.GetEnumerator();
        }

        public Type ElementType
        {
            get { return typeof(T); }
        }

        public System.Linq.Expressions.Expression Expression
        {
            get { return source.Expression; }
        }

        public IQueryProvider Provider
        {
            get { return new FakeRavenQueryProvider(source, QueryStatistics); }
        }
    }

    public class FakeRavenQueryProvider : IQueryProvider
    {
        private IQueryable source;
        private RavenQueryStatistics stats;

        public FakeRavenQueryProvider(IQueryable source, RavenQueryStatistics stats = null)
        {
            this.source = source;
            this.stats = stats;
        }

        public IQueryable<TElement> CreateQuery<TElement>(System.Linq.Expressions.Expression expression)
        {
            return new FakeRavenQueryable<TElement>(source.Provider.CreateQuery<TElement>(expression), stats);
        }

        public IQueryable CreateQuery(System.Linq.Expressions.Expression expression)
        {

            var type = typeof(FakeRavenQueryable<>).MakeGenericType(expression.Type);
            return (IQueryable)Activator.CreateInstance(type, source.Provider.CreateQuery(expression), stats);
        }

        public TResult Execute<TResult>(System.Linq.Expressions.Expression expression)
        {
            return source.Provider.Execute<TResult>(expression);
        }

        public object Execute(System.Linq.Expressions.Expression expression)
        {
            return source.Provider.Execute(expression);
        }
    }

It can be returned from a mocked IDocumentSession using code like the following (using Moq in this case):

    Mock<IDocumentSession> session = new Mock<IDocumentSession>();
    session.Setup(s => s.Query<Product>()).Returns(
        new FakeRavenQueryable<Product>(
            productList.AsQueryable(), 
            new RavenQueryStatistics { TotalResults = 100 }
        )
    );
    
    // inject your session & run your tests here
    RavenQueryStatistics stats;
    var results = session.Object.Query<Product>().Where(p => p.Id == "Products/1")
        .Statistics(out stats)
        .Take(1)
        .ToList();

It won’t make you breakfast, give you full access to the advanced session methods, or tell you if you’re passing unsupported expressions, but it will allow you to mock out simple queries in your application, and the Linq methods all work over your test list as you’d expect (even In!)