DocumentCloud's project, Underscorejs, interested me for a while now since I've seen its beautiful documentation produced through Docco.

Additionally, while playing with higher-order functions on Haskell, I wanted to see how they may simplify a standard set of computational tasks that I would normall write in C. Fortunately, JavaScript allows me to do this in a multi-paradigm fashion.

Wait! What Are These Magical, Higher-Order Functions?

Well, I'll turn to the pure, functional language, Haskell, to describe it correctly:

A higher-order function is a function that takes other functions as arguments or returns a function as result.

Standard Computational Task

The situation: a professor has just finished grading the final exams out of a hundred for his eight-student classroom. He would like to know the top scores (above 80), the highest score (maximum) and the average of the scores.

Class Scores: [98, 76, 45, 84, 32, 55, 88, 76]

From the problem description, we have the following set of tasks with constraints:

  1. Retrieve all scores above eighty.
  2. Retrieve the maximum score from the list.
  3. Retrieve the running sum of the scores and divide it by the number of scores (mean).

Now that we have described the problem, let's attempt a standard solution with C.

C-Style Solution

The following solution does not have the immutable array data structure; however, I ensured that there were no side-effects which may mutate the original array after calculations. Instead, all solutions are written to a new result variable as indicated by Calculation results section.

Furthermore, I fissioned the calculation loops to represent how Underscorejs implements their higher-order functions. Note that it would have been optimal if these operations were fused instead.

Loop fission (or loop distribution) is a possible compiler optimization which utilizes the locality of reference.

#include <stdio.h>
#define NUM_SCORES 8

int main(int argc, char* argv[]) {
  // Calculation results
  int top_scores[NUM_SCORES];
  int highest_score = 0;
  int mean = 0;

  // Input data
  int scores[NUM_SCORES];
  scores[0] = 98;
  scores[1] = 76;
  scores[2] = 45;
  scores[3] = 84;
  scores[4] = 32;
  scores[5] = 55;
  scores[6] = 88;
  scores[7] = 76;

  // Iterators
  int i;
  int top_scores_counter = 0;
  int running_sum = 0; // Memo

  // Calculate the top scores above 80
  for (i = 0; i < NUM_SCORES; ++i)
    if (scores[i] > 80)
      top_scores[top_scores_counter++] = scores[i];

  // Calculate the highest score
  for (i = 0; i < NUM_SCORES; ++i)
    if (scores[i] > highest_score)
      highest_score = scores[i];

  // Calculate the running sum
  for (i = 0; i < NUM_SCORES; ++i)
    running_sum += scores[i];
  // Calculate the mean after the finding the running sum
  mean = running_sum / NUM_SCORES;

  // Print the data
  printf("Top Scores: ");
  for (i = 0; i < top_scores_counter; ++i)
    printf("%d ", top_scores[i]);
  printf("\n");

  printf("Highest Score: %d \n", highest_score);
  printf("Average Score: %d \n", mean);

  return 0;
}

When executed, the above code should print the following results to the standard output:

Top Scores: 98 84 88 
Highest Score: 98 
Average Score: 69

Disregarding the scaffolding here, the the C-style solution is verbose simply because of the majority of operations in the for-loops. Sure, we can coalesce the loops through the loop fusion optimization but why aren't these operations provided by the standard library by default?

This problem can naturally be solved through higher-order functions which apply functions to each iteration through a loop.

C++ STL provides a notion of higher-order functions through their concept of function objects.

Underscorejs Solution

The C-style solution is applicable to almost all programming languages that offer the standard set of control structures with for-loop and if-else though I do not apply it as frequently as is applicable. It's verbose. Without higher-order functions, I would implement it in JavaScript as I have above.

Fortunately, Underscorejs provides a standard set of higher-functions for reducing the verbosity of our code. We could simply use _.filter to retrieve the top scores, _.max to find the highest scores and _.reduce to find the running sum for the mean.

// Initial data
var scores = [98, 76, 45, 84, 32, 55, 88, 76];

// Calculates the top scores
var top_scores = _.filter(scores, function(score) {
  return score > 80;
});

// Calculate the highest score
var highest_score = _.max(scores);

// Calculate the running sum
var running_sum = _.reduce(scores, function(memo, score) {
  return memo + score;
}, 0);
var mean = running_sum / scores.length;

// Print the data
console.log("Top Scores: " + top_scores);
console.log("Highest Score: " + highest_score;
console.log("Average Score: " + mean);

It's easy to see how our code verbosity has been significantly reduced through the use of the higher-order functions provided by Underscorejs. There are native implementations of the functions used above, namely, Array.prototype.filter and Array.prototype.reduce; nonetheless, I use Underscorejs for cross-browser compatibility.

How Underscorejs implements Higher-Order Functions

It may be obvious that these higher-order functions parallel to the C-style code above by applying a function to each iteration of a loop; still, I will make explicit its implementation. Moreover, I will add my personal annotations to the source code to help guide you through it.

Consider the implementation of _.filter which returns a new array of filtered values given a list (obj) and a boolean function (iterator).

Recall that a higher-order function either takes a function as an argument or returns a function; subsequently, _.filter is a higher-order function by definition of taking a function as an argument.

_.filter = _.select = function(obj, iterator, context) {
  // The new results array to attempt immutability
  var results = [];

  // Edge case if the list is empty; simply return nothing
  if (obj == null) return results;

  // Use the native implementation if it's available.
  if (nativeFilter && obj.filter === nativeFilter) return obj.filter(iterator, context);

  // Real work is delegated?
  each(obj, function(value, index, list) {
    // Returns the value if the boolean function evaluates to true
    if (iterator.call(context, value, index, list)) results[results.length] = value;
  });
  return results;
};

First, let's consider all of the function cases:

Empty List

Returns a new, empty list. Trivial.

Native Implementation Available

It's worth noting here that _.filter utilizes the native JavaScript implementation as denoted by the nativeFilter case. If the native implementation is unavailable, we move on to support backwards-compatibility and cross-browser compatibility through this library.

Default Case

Delegates the action to another function, _.each. Well, we haven't seen a loop yet but we're about to. Let's consider the delegated action then.

var each = _.each = _.forEach = function(obj, iterator, context) {
  // Base case if the list is empty
  if (obj == null) return;

  // Use the native implementation if it's available
  if (nativeForEach && obj.forEach === nativeForEach) {
    obj.forEach(iterator, context);

  // JS Hacking here to test whether obj can be accesed by a numerical index
  } else if (obj.length === +obj.length) {
    for (var i = 0, l = obj.length; i < l; i++) {
      if (i in obj && iterator.call(context, obj[i], i, obj) === breaker) return;
    }

  // Default case for objects with generic keys
  } else {
    for (var key in obj) {
      if (_.has(obj, key)) {
        if (iterator.call(context, obj[key], key, obj) === breaker) return;
      }
    }
  }
};

There's quite a few more base cases here so I'll truncate by skipping the base case and native implementation as I've already explained before.

Since the last two cases are similar, I'll aggregate them into the same case:

Index Accessible & Default Case

This is where the real-action is. Simply, this case applies an arbitrary function to the list at every index.

This arbitrary function application allows us significantly reduce the amount of code we must write. The only caveat here is that the _.each function can mutate the original data structure which defies the immutability concept of functional programming.

Simplifying the Data Flow through the Framework

To best see the flow of operations through the framework, I've simplified the framework to remove unnecessary code for the purpose of this article.

var each = _.each = _.forEach = function(obj, iterator, context) {
  for (var i = 0, l = obj.length; i < l; i++) {
    if (i in obj && iterator.call(context, obj[i], i, obj) === breaker) return;
};

_.filter = _.select = function(obj, iterator, context) {
  // The new results array to attempt immutability
  var results = [];

  // Real work is delegated?
  each(obj, function(value, index, list) {
    // Returns the value if the boolean function evaluates to true
    if (iterator.call(context, value, index, list)) results[results.length] = value;
  });
  return results;
};

Writing your own set of higher-order functions should be just as simple.

Conclusion

As a prelude, there will be more to come on Underscorejs.

I've shown the prowess of higher-order functions which may apply to any language that implements them and it's always worth considering to frame these common operations under a common library to reduce verbosity in future code. Now that higher-order functions have been explained, I will demonstrate practical applications of the library.