Tuesday, December 27, 2016

Stacks and Queues

This post shows you how to implement a Stack and a Queue in Java. It might be handy in interviews ;)

Implementing a Stack:

A stack uses LIFO (last-in first-out) ordering. Think of a stack of dinner plates. It provides constant time adds and removes. Sample code:

public class Stack<E> {
  private static class StackItem<E> {
    final E data;
    StackItem<E> next;

    public StackItem(final E data) {
      this.data = data;
    }
  }

  private StackItem<E> top;

  public boolean isEmpty() {
    return top == null;
  }

  public void push(final E e) {
    final StackItem<E> toAdd = new StackItem<>(e);
    toAdd.next = top;
    top = toAdd;
  }

  public E pop() {
    if (isEmpty()) {
      throw new NoSuchElementException("Stack is empty");
    }
    final E data = top.data;
    top = top.next;
    return data;
  }

  public E peek() {
    if (isEmpty()) {
      throw new NoSuchElementException("Stack is empty");
    }
    return top.data;
  }
}
Implementing a Queue:

A queue uses FIFO (first-in first-out) ordering. Sample code:

public class Queue<E> {
  private static class QueueItem<E> {
    final E data;
    QueueItem<E> next;

    public QueueItem(final E data) {
      this.data = data;
    }
  }

  private QueueItem<E> first;
  private QueueItem<E> last;

  public boolean isEmpty() {
    return first == null;
  }

  public void add(final E e) {
    final QueueItem<E> toAdd = new QueueItem<>(e);
    if (last != null) {
      last.next = toAdd;
    }
    last = toAdd;
    if (first == null) {
      first = last;
    }
  }

  public E remove() {
    if (isEmpty()) {
      throw new NoSuchElementException("Queue is empty");
    }
    final E data = first.data;
    first = first.next;
    if (first == null) {
      last = null;
    }
    return data;
  }

  public E peek() {
    if (isEmpty()) {
      throw new NoSuchElementException("Queue is empty");
    }
    return first.data;
  }
}

Saturday, December 24, 2016

Shell Scripting: Parsing options using getopt and getopts

This post shows how you can parse shell options using getopts and getopt.

Using getopts:

getopts is a bash built-in command. I find it a lot easier to use than getopt.

Here is an example of using getopts:

# options a and b are followed by a colon because they require arguments
while getopts "ha:b:" opt; do
  case "$opt" in
    h)
      echo "help"
     ;;
    a)
      option_a=$OPTARG
      ;;
    b)
      option_b=$OPTARG
      ;;
  esac
done
shift $((OPTIND-1))

echo "option_a: $option_a"
echo "option_b: $option_b"

# read positional parameters
echo "Param 1: $1"
echo "Param 2: $2"

Running it:

$ myscript.sh -a foo -b bar hello world
option_a: foo
option_b: bar
Param 1: hello
Param 2: world
Using getopt:

getopt supports long options and that's the only time I use it.

Here is an example of using getopt:

options=$(getopt -n "$0" -o ha:b: -l "help,alpha:,bravo:"  -- "$@")
(( $? != 0 )) && echo "Incorrect options provided" >&2 && exit 1
eval set -- "$options"

while true; do
  case "$1" in
    -h|--help)
        echo "help"
        ;;
    -a|--alpha)
        shift
        option_a="$1"
        ;;
    -b|--bravo)
        shift
        option_b="$1"
        ;;
    --)
        shift
        break
        ;;
  esac
  shift
done

echo "option_a: $option_a"
echo "option_b: $option_b"

# read positional parameters
echo "Param 1: $1"
echo "Param 2: $2"

Running it:

$ myscript.sh --alpha foo --bravo bar hello world
option_a: foo
option_b: bar
Param 1: hello
Param 2: world

Sunday, November 06, 2016

Detecting a Loop in a Linked List

In a previous post, I wrote about how you can detect a loop in a linked list using Floyd's Hare and Tortoise algorithm. You have two iterators - the hare and the tortoise - one of them runs faster than the other and, if the list has a loop, they will eventually collide. Here is a recap of the algorithm:

public static boolean hasLoop(final Node head) {
  Node tortoise = head;
  Node hare = head;
  while (hare != null && hare.next != null) {
    tortoise = tortoise.next;
    hare = hare.next.next;
    if (hare == tortoise) { // collision
      return true;
    }
  }
  return false;
}
How do you find the start of the loop?

Now, let's make this problem more interesting by finding the node at the beginning of the loop.

Consider the following illustration of a linked list with a loop. We need to find the start of the loop, which is 3.

0 - 1 - 2 - 3
           / \
         10   4
          |   |
          9   5
          |   |
          8   6
           \ /
            7

We know that the hare runs twice as fast as the tortoise. Let's say that the tortoise enters the loop after k steps. Then the hare has taken 2k steps in total and is, therefore, k steps ahead of the tortoise within the loop. This means that hare and tortoise are LOOP_SIZE - k nodes away from each other. Since hare moves two nodes for each node that tortoise moves, they move one node closer to each on each turn and will meet after LOOP_SIZE - k turns, and both will be k nodes from the start of the loop at that point. The start of the loop is also k nodes from the beginning of the list.

In my example, the tortoise enters the loop (node 3) after 3 steps. At that stage, the hare has taken 6 steps in total and is at node 6. They collide at node 8 after 5 steps (LOOP_SIZE = 8, k = 3, LOOP_SIZE - k = 5). The start of the loop is 3 nodes away from the point of collision and also 3 nodes away from the head of the list.

So, if we keep the hare where it is (at the point of collision) and move the tortoise back to the beginning of the list, and then move each of them one step at a time, they will collide at the start of the loop!

Here is the algorithm:

public static Node findLoopStart(final Node head) {
  Node tortoise = head;
  Node hare = head;

  boolean hasLoop = false;
  while (hare != null && hare.next != null) {
    tortoise = tortoise.next;
    hare = hare.next.next;
    if (hare == tortoise) {
      hasLoop = true;
      break;
    }
  }

  if (hasLoop) {
    tortoise = head;
    while (tortoise != hare) {
      tortoise = tortoise.next;
      hare = hare.next;
    }
    return tortoise;
  }

  return null;
}

Sunday, October 30, 2016

Java 8: Updates to Map

In Java 8, the Map interface has been updated to include several new, convenient methods. Some of the important changes are outlined below:

1. computeIfAbsent

computeIfAbsent is particularly useful when implementing the caching pattern. For example, previously, you would do this:

private final Map<String, Data> map = new HashMap<>();

public Data getData(final String key) {
  Data data = map.get(key); // look in cache
  if (data == null) { // data is not in cache
    data = computeData(key); // perform an (expensive) operation
    map.put(key, data); // put data in cache
  }
  return data; // return data
}

You can now write this method more concisely as follows:

public Data getData(final String key) {
  return map.computeIfAbsent(key, this::computeData);
}

You can also use computeIfAbsent when implementing a "multi-map" (Map<K, Collection<V>>):

 map.computeIfAbsent(key, k -> new ArrayList<V>()).add(v);
2. putIfAbsent

putIfAbsent puts a value in the map, if the key is not already present. For example, instead of this:

final Map<String, String> map = new HashMap<>();
final String v = map.get(k);
if (v == null) {
  map.put(k, newValue);
}

you can write:

final Map<String, String> map = new HashMap<>();
map.putIfAbsent(k, newValue);
3. getOrDefault

getOrDefault returns a default value, if a specified key is not found in the map. Previously, you would do this:

final Map<String, String> map = new HashMap<>();
return map.containsKey(key) ? map.get(key) : "foo";

Now you can write:

final Map<String, String> map = new HashMap<>();
return map.getOrDefault(key, "foo");
4. Performance

The implementation of HashMap has been updated so that when buckets become too big (see the "TREEIFY_THRESHOLD" setting) they are transformed from lists (with O(n) retrieval) into trees (with faster O(log(n)) retrieval). This improves performance when hashCode() methods return values that are poorly distributed, or when many keys share the same hashCode, so long as they are also Comparable.

Sunday, September 25, 2016

Java 8 Concurrency: LongAdder and LongAccumulator

Java 8 introduces LongAdder, LongAccumulator, DoubleAdder and DoubleAccumulator, which are recommended instead of the Atomic classes when multiple threads update frequently but read less frequently (for example, in the context of gathering statistics). Under high thread contention, these new classes are designed to grow dynamically and, therefore, there is higher throughput at the expense of higher space consumption.

The "Adder" classes support operations for additions, whereas the "Accumulator" clases are given a function to combine values.

To understand how these classes work, you need to take a look at the implementation of the base class, Striped64. It has a "base" field, plus a table of "cells", each of which is a padded variant of AtomicLong to reduce cache contention. It first tries to update the base field using the usual AtomicLong CAS operation. If there is no contention, CAS will succeed. However, if there is contention (i.e. the CAS update of "base" failed), it will use the "probe" value (hash value) of the thread to map to a cell in the table. If a cell does not exist, it will be created. New cells are created upon contention until the number of cells reaches the nearest power of two greater than or equal to the number of CPUs. If the cell exists, CAS will be tried to update the value of the cell. If CAS fails, the probe value will be updated using a secondary hash until an uncontended cell is found. Finally, the sum method simply adds the base field and the elements in the cells array.

The code below shows how you can use LongAdder to calculate the sum of several values:

LongAdder adder = new LongAdder();

// do some addition in different threads
adder.add(10);
adder.increment();
// ...

// get the current sum
long sum = adder.sum();

Or you can use LongAccumulator as follows:

LongAccumulator acc = new LongAccumulator(Long::sum, 0);

acc.accumulate(10);
adder.increment();
// ...

// get the current sum
long sum = acc.get();

You can also use LongAdder in conjuction with a ConcurrentHashMap to maintain a frequency map:

ConcurrentMap<String, LongAdder> freqs = new ConcurrentHashMap<>();
freqs.computeIfAbsent("foo", k -> new LongAdder()).increment();

Sunday, September 18, 2016

Power Set Algorithm

The power set of a set S is the set of all subsets of S including the empty set and S itself. For example, the power set of {1, 2, 3} is {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.

The problem can be solved using recursion. We know that if the initial set is empty, then its power set is {{}} and if it has only one element {a} then its power set is {{}, {a}}.

Here is the algorithm:

public static <E extends Comparable<E>> List<List<E>> subsets(final List<E> list) {
  final List<List<E>> result = new ArrayList<>();

  // if the list is empty there is only one subset, which is the empty set
  if (list.isEmpty()) {
    result.add(Collections.emptyList());
    return result;
  }

  // otherwise, get the first element
  final E first = list.get(0);

  // find the subsets of the rest of the list
  final List<E> rest = list.subList(1, list.size());
  final List<List<E>> subsets = subsets(rest);
  result.addAll(subsets);

  // create new subsets by inserting the first element into each subset
  for (final List<E> subset : subsets) {
    final List<E> newSubset = new ArrayList<>(subset);
    newSubset.add(0, first);
    result.add(newSubset);
  }

  return result;
}

Sunday, September 11, 2016

Factorial Using Iteration, Recursion and Java 8 Streams

In this post, we will solve the classic school problem of calculating the factorial of a positive integer, using different Java algorithms.

Note: In all the algorithms below, we assume that the input is > 1.

1. Iterative Factorial

In this algorithm, we use a standard for-loop and increment the variables i and result at each iteration:

static int iterativeFactorial(final int n) {
  int result = 1;
  for (int i = 1; i <= n; i++) {
    result *= i;
  }
  return result;
}
2. Recursive Factorial

Here is the recursive version (the function calls itself):

static int recursiveFactorial(final int n) {
  return n == 1 ? 1 : n * recursiveFactorial(n - 1);
}

Recursion lets you get rid of variables that are updated at each iteration. Typically, making a recursive function call is more expensive than iteration because each time the function is called, a new stack frame has to be created on the call stack to hold the state of each function call. Therefore, the memory consumed by the recursive function is proportional to the size of its input. You may even get a StackOverflowError with a large input.

3. Tail-recursive Factorial

Here's a tail-recursive definition of factorial. The recursive call is the last thing that happens in the function. In contrast, in our previous definition of factorial, the last thing was n multiplied by the result of the recursive call.

static int tailRecursiveFactorial(final int n) {
  return tailRecursiveFactorialHelper(n, 1);
}
private static int tailRecursiveFactorialHelper(final int n, final int acc) {
  return n == 1 ? acc : tailRecursiveFactorialHelper(n - 1, n * acc);
}

Tail-recursive functions can be optimised by the compiler in some programming languages. Instead of storing each intermediate result of the recursion onto different stack frames, a single stack frame can be reused instead, because there's no need to keep track of intermediate results. Unfortunately, Java does not support this kind of optimisation, but you might want to take this approach anyway, in case the compiler is optimised in the future. Other JVM languages (e.g. Scala) can optimise tail-recursion.

4. Stream Factorial

Finally, Java 8 streams provide a simple, declarative way of defining factorial:

static int streamFactorial(final int n) {
  return IntStream.rangeClosed(1, n).reduce(1, (a, b) -> a * b);
}

Sunday, July 31, 2016

Java 8: Implementing a custom TemporalAdjuster

TemporalAdjusters, introduced in Java 8's new Date and Time API, allow you to perform complex date manipulations. For example, you can adjust a date to the next Friday, or to the last day of the month. There are already several pre-defined TemporalAdjusters, which can be accessed using the static factory methods in the TemporalAdjusters class, as shown below:

import java.time.DayOfWeek;
import java.time.LocalDate;
import static java.time.temporal.TemporalAdjusters.*;

LocalDate date = LocalDate.of(2016, 7, 30);
LocalDate nextFriday = date.with(nextOrSame(DayOfWeek.FRIDAY)); // 2016-08-05
LocalDate monthEnd = date.with(lastDayOfMonth()); // 2016-07-31

If you can't find a suitable TemporalAdjuster, it's quite easy to create your own, by implementing the TemporalAdjuster interface.

Here is an example of a custom TemporalAdjuster, which moves the date forward to the next working day:

public class NextWorkingDay implements TemporalAdjuster {
  @Override
  public Temporal adjustInto(Temporal temporal) {
    DayOfWeek dayOfWeek = DayOfWeek.of(temporal.get(ChronoField.DAY_OF_WEEK));
    int daysToAdd = dayOfWeek == DayOfWeek.FRIDAY ? 3 :
                   (dayOfWeek == DayOfWeek.SATURDAY ? 2 : 1);
    return temporal.plus(daysToAdd, ChronoUnit.DAYS);
  }
}

Since TemporalAdjuster is a functional interface, you could use a lambda expression, but it is very likely that you will want to use this adjuster in other parts of your code, so it is better to encapsulate the logic in a proper class, which can then be re-used.

To define a TemporalAdjuster with a lambda expression, it is more convenient to use the ofDateAdjuster static factory method because it allows you work with a LocalDate object instead of a low level Temporal.

static TemporalAdjuster NEXT_WORKING_DAY = TemporalAdjusters.ofDateAdjuster(
  date -> {
   DayOfWeek dayOfWeek = date.getDayOfWeek();
   int daysToAdd = dayOfWeek == DayOfWeek.FRIDAY ? 3 :
                  (dayOfWeek == DayOfWeek.SATURDAY ? 2 : 1);
   return date.plusDays(daysToAdd);
});

Sunday, June 26, 2016

Java 8: CompletableFuture vs Parallel Stream

This post shows how Java 8's CompletableFuture compares with parallel streams when peforming asynchronous computations.

We will use the following class to model a long-running task:

class MyTask {
  private final int duration;
  public MyTask(int duration) {
    this.duration = duration;
  }
  public int calculate() {
    System.out.println(Thread.currentThread().getName());
    try {
      Thread.sleep(duration * 1000);
    } catch (final InterruptedException e) {
      throw new RuntimeException(e);
    }
    return duration;
  }
}

Let's create ten tasks, each with a duration of 1 second:

List<MyTask> tasks = IntStream.range(0, 10)
                                    .mapToObj(i -> new MyTask(1))
                                    .collect(toList());

How can we calculate the list of tasks efficiently?

Approach 1: Sequentially

Your first thought might be to calculate the tasks sequentially, as follows:

public static void runSequentially(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<Integer> result = tasks.stream()
                              .map(MyTask::calculate)
                              .collect(toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

As you might expect, this takes 10 seconds to run, because each task is run one after the other on the main thread.

Approach 2: Using a parallel stream

A quick improvement is to convert your code to use a parallel stream, as shown below:

public static void useParallelStream(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<Integer> result = tasks.parallelStream()
                              .map(MyTask::calculate)
                              .collect(toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

The output is

main
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-2
main
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-1
main
Processed 10 tasks in 3043 millis

This time it took 3 seconds because 4 tasks were run in parallel (using three threads from the ForkJoinPool, plus the main thread).

Approach 3: Using CompletableFutures

Let's see if CompletableFutures perform any better:

public static void useCompletableFuture(List<MyTask> tasks) {
  long start = System.nanoTime();
  List<CompletableFuture<Integer>> futures =
      tasks.stream()
           .map(t -> CompletableFuture.supplyAsync(() -> t.calculate()))
           .collect(Collectors.toList());

  List<Integer> result =
      futures.stream()
             .map(CompletableFuture::join)
             .collect(Collectors.toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
}

In the code above, we first obtain a list of CompletableFutures and then invoke the join method on each future to wait for them to complete one by one. Note that join is the same as get, with the only difference being that the former doesn't throw any checked exception, so it's more convenient in a lambda expression.

Also, you must use two separate stream pipelines, as opposed to putting the two map operations after each other, because intermediate stream operations are lazy and you would have ended up processing your tasks sequentially! That's why you first need to collect your CompletableFutures in a list to allow them to start before waiting for their completion.

The output is

ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
ForkJoinPool.commonPool-worker-2
ForkJoinPool.commonPool-worker-3
ForkJoinPool.commonPool-worker-1
Processed 10 tasks in 4010 millis

It took 4 seconds to process 10 tasks. You will notice that only 3 ForkJoinPool threads were used and that, unlike the parallel stream, the main thread was not used.

Approach 4: Using CompletableFutures with a custom Executor

One of the advantages of CompletableFutures over parallel streams is that they allow you to specify a different Executor to submit their tasks to. This means that you can choose a more suitable number of threads based on your application. Since my example is not very CPU-intensive, I can choose to increase the number of threads to be greater than Runtime.getRuntime().getAvailableProcessors(), as shown below:

public static void useCompletableFutureWithExecutor(List<MyTask> tasks) {
  long start = System.nanoTime();
  ExecutorService executor = Executors.newFixedThreadPool(Math.min(tasks.size(), 10));
  List<CompletableFuture<Integer>> futures =
      tasks.stream()
           .map(t -> CompletableFuture.supplyAsync(() -> t.calculate(), executor))
           .collect(Collectors.toList());

  List<Integer> result =
      futures.stream()
             .map(CompletableFuture::join)
             .collect(Collectors.toList());
  long duration = (System.nanoTime() - start) / 1_000_000;
  System.out.printf("Processed %d tasks in %d millis\n", tasks.size(), duration);
  System.out.println(result);
  executor.shutdown();
}

The output is

pool-1-thread-2
pool-1-thread-4
pool-1-thread-3
pool-1-thread-1
pool-1-thread-5
pool-1-thread-6
pool-1-thread-7
pool-1-thread-8
pool-1-thread-9
pool-1-thread-10
Processed 10 tasks in 1009 millis

After this improvement, it now takes only 1 second to process 10 tasks.

As you can see, CompletableFutures provide more control over the size of the thread pool and should be used if your tasks involve I/O. However, if you're doing CPU-intensive operations, there's no point in having more threads than processors, so go for a parallel stream, as it is easier to use.

Sunday, June 19, 2016

Java 8: Default Method Resolution Rules

With the introduction of default methods in Java 8, it is now possible for a class to inherit the same method from multiple places (such as another class or interface). The following rules can be used to determine which method is selected in such cases:

  1. A class or superclass method declaration always takes priority over a default method
  2. Otherwise, the method with the most specific default-providing interface is used
  3. Finally, if the methods are equally specific, there will be a compiler error and you will be forced to explicitly override the method and specify which one your class should call

Let's look at a few examples and apply these rules.

Example 1:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {
  @Override
  public void name() {
    System.out.println("C");
  }
}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: C

This is because, as stated in Rule 1, the method declaration of name() from the superclass C takes priority over the default methods declarations in A and B.

Example 2:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {
  @Override
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: B

Unlike the previous example, C does not override name(), but since it implements A, it has a default method from A. According to Rule 2, if there are no methods in the class or superclass, the most specific default-providing interface is selected. Since B extends A, it is more specific and, as a result, "B" is printed.

Example 3:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class D implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: Compiler error! Duplicate default methods named name with the parameters () and () are inherited from the types B and A

In this example, there's no more-specific default-providing interface to select, so the compiler throws an error. To resolve the error, you need to explicitly override the method in D and specify which method declaration you want D to use. For example, if you want to use B's:

class D implements A, B {
  @Override
  public void name() {
    B.super.name();
  }
}

Example 4:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {}

public interface C extends A {}

public class D implements B, C {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: A

The sub-interfaces B and C haven't overridden the method, so there is actually only the method from A to choose from. As a side note, if either B or C (but not both) had overridden the method, then Rule 2 would have applied. By the way, this is the diamond problem.

Saturday, June 18, 2016

Java 8: Debugging Stream Pipelines

I've found that stream pipelines can be difficult to debug because stack traces involving lambda expressions are quite cryptic. Consider the following contrived example:

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .map(Function.identity())
        .filter(x -> true)
        .map(String::length)
        .forEach(System.out::println);
  }
}

You may have already guessed that the code above will throw a NullPointerException when String.length is called on the null element in the list. I've added extra map and filter operations, which do nothing, just to make the example a bit more interesting. In the real world, you will probably have a number of different operations in your stream pipeline.

Running the code, produces the following stack trace:

Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$3/455659002.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:12)

The stack trace shows that a NullPointerException occurred but it doesn't tell you which operation in the pipeline failed. What does Test$$Lambda$3/455659002.apply(Unknown Source) mean and why is there no line number?! Since lambda expressions don't have a name, the compiler makes one up (similar to anonymous classes). In this case, it is Test$$Lambda$3 but that doesn't help us track the bug in our code.

So, what can we do? Let's go old-school and add some logging to our code! We can use peek to print out each element before it is consumed by the next operation in the pipeline.

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .peek(x -> System.out.println("Running identity on: " + x))
        .map(Function.identity())
        .peek(x -> System.out.println("Running filter on: " + x))
        .filter(x -> true)
        .peek(x -> System.out.println("Running string length on: " + x))
        .map(String::length)
        .peek(x -> System.out.println("Running print on: " + x))
        .forEach(System.out::println);
  }
}

Running it produces the following output:

Running identity map on: foo
Running filter on: foo
Running string length on: foo
Running print on: 3
3
Running identity map on: null
Running filter on: null
Running string length on: null
Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$6/295530567.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:16)

Great! Now we know that the NullPointerException was thrown by the string length lambda!

In general, I think stack traces involving lambdas could be improved in future versions of Java.

Sunday, June 12, 2016

Java 8: Converting Anonymous Classes to Lambda Expressions

Refactoring anonymous classes (that implement a single method) to lambda expressions, makes your code more succint and readable. For example, here's an anonymous class for a Runnable and its lambda equivalent:

// using an anonymous class
Runnable r = new Runnable() {
  @Override
  public void run() {
    System.out.println("Hello");
  }
};

// using a lambda expression
Runnable r2 = () -> System.out.println("Hello");

However, it's not always that simple!

Here are a couple of gotchas:

1. Different scoping rules

There are different scoping rules between anonymous classes and lambda expressions. For example, in lambda expressions, this and super are lexically scoped, meaning they are relative to the enclosing class, but in an anonymous class, they are relative to the anonymous class itself. Similarly, local variables declared in lambda expressions will conflict with variables declared in the enclosing class, but in anonymous classes, they are allowed to shadow variables in the enclosing class. Here is an example:

int foo = 1;
Runnable r = new Runnable() {
  @Override
  public void run() {
    // this is ok!
    int foo = 2;
  }
};

Runnable r2 = () -> {
  // compile error: Lambda expression's local variable foo cannot
  // redeclare another local variable defined in an enclosing scope.
  int foo = 2;
};

2. Overloaded methods

If you have an overloaded method, using lambda expressions can result in an ambiguous method call and will require explicit casting. Here is an example:

// Functional interface
interface Task {
  public void execute();
}

// Overloaded methods
public static void go(final Runnable r) {
  r.run();
}
public static void go(final Task t) {
  t.execute();
}

// Calling the overloaded method:

// When using an anonymous class, there is no ambiguity because
// the type of the class is explicit at instantiation
go(new Task() {
  @Override
  public void execute() {
     System.out.println("Hello");
  }
});

// When using a lambda expression, there is a compile error!
// The method go(Runnable) is ambiguous
go(() -> {
  System.out.println("Hello");
});

// This ambiguity can be solved with an explicit cast
go((Task)() -> {
  System.out.println("Hello");
});

Saturday, May 07, 2016

kdb+/q - Running a function multiple times and concatenating results

Let's say that you have a q function that you want to run for multiple inputs (for example, a list of dates) and you want to concatenate the outputs of each function call into a single table. This is demonstrated below:

// A function that returns some data for a single date.
getData:{[dt]
    t:select from data where date=dt;
    // do some more stuff
    t}

// list of dates
dates:2016.01.01+til 10

// call the function for each date and join results
result:(uj/) getData each dates;

// or, if you want to do some processing after each function call:
result:(uj/) {[dt;param]
    t:getData[dt];
    // do some stuff with param
    t}[;`foo] each dates;

Saturday, April 23, 2016

kdb+/q - Joins

You're probably familiar with SQL joins, which are used to combine data from more than one table. This post shows how you can do the same in kdb.

I'll use the following two tables to demonstrate joins in kdb:

q) show age:([] name:`alice`charlie`dave`frank; age:25 26 31 33)
name    age
-----------
alice   25
charlie 26
dave    31
frank   33

q) show work:([] name:`alice`bob`charlie`eve; dept:`ops`dev`hr`eng)
name    dept
------------
alice   ops
bob     dev
charlie hr
eve     dev

Now let's join these tables on the name field using the different kinds of kdb joins:

Left Join (lj):

In a left join, all the rows of the left table are preserved, and those from the right table are only combined if the values in the key columns match those in the left table. If there are values in the left table that don't exist in the right, the joined columns will contain nulls.

q) age lj `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr
dave    31
frank   33

Union Join (uj):

A union join (referred to as a full outer join in SQL) returns rows from both tables, regardless of whether there is a match. If there is a match, it returns combined rows, and if there isn't, the missing side will contain nulls.

q) (`name xkey age) uj `name xkey work
name   | age dept
-------| --------
alice  | 25  ops
charlie| 26  hr
dave   | 31
frank  | 33
bob    |     dev
eve    |     eng

Inner Join (ij):

An inner join returns only those rows where there is a match in both tables.

q) age ij `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr

Equi Join (ej):

This is an inner join in which you can specify the list of columns to join on.

q) ej[`name;age;work]
name    age dept
----------------
alice   25  ops
charlie 26  hr

Plus Join (pj):

This is a left join, which also sums the values of common columns (other than key columns). This type of join doesn't exist in SQL.

q) t1:([] name:`alice`charlie`dave`frank; x:1 2 3 4)
q) t2:([] name:`alice`bob`charlie`eve; x:10 10 10 10)
q) t1 pj `name xkey t2
name    x
----------
alice   11
charlie 12
dave    3
frank   4

Cross Join (cross):

This is a "cartesian product". It joins everything to everything and isn't normally the kind of join people are looking for. Each row is a combination of the rows of the first and second table, so can be a dangerous join to run against large tables.

q) `name xasc work cross age
name    dept age
----------------
alice   ops  25
alice   dev  25
alice   hr   25
alice   eng  25
charlie ops  26
charlie dev  26
charlie hr   26
charlie eng  26
dave    ops  31
dave    dev  31
dave    hr   31
dave    eng  31
frank   ops  33
frank   dev  33
frank   hr   33
frank   eng  33

Sunday, April 17, 2016

Java 8 Streams API: Immutable List Collector

Instead of first collecting a stream into a list and then making it unmodifiable, as shown below:

mutableList = list.stream()
                  // perform some stream operations...
                  .filter(myFilterPredicate)
                  .map(myMapperFunction)
                  // collect into a list
                  .collect(toList());

// now make the list unmodifiable
return Collections.unmodifiableList(list);

you can using collectingAndThen as follows:

return list.stream()
           // perform some stream operations...
           .filter(myFilterPredicate)
           .map(myMapperFunction)
           // collect into an unmodifiable list
           .collect(collectingAndThen(toList(), Collections::unmodifiableList));

Saturday, April 16, 2016

kdb+/q - Concatenating columns

This post shows how you can concatenate values in two or more columns of a table in kdb, into a single value. Consider the following table:

q) person:([] firstName:`Alice`Bob`Charles;lastName:`Smith`Jones`Brown)
q) person
firstName lastName
------------------
Alice     Smith
Bob       Jones
Charles   Brown

In SQL, it's quite easy to concatenate columns, like this:

SQL> select firstName || ' ' || lastName as fullName from person;
fullName
--------
Alice Smith
Bob Jones
Charles Brown

In q, the same thing can be achieved by flipping the firstName and lastName columns and then calling sv to convert the resulting vector into a string, using a space separator. This is shown below:

q) select fullName:`$" "sv'string flip(firstName;lastName) from person

// in functional form:
q) colsToJoin:`firstName`lastName;
q) ?[person;();0b;enlist[`fullName]!enlist(`$sv';" ";(string;(flip;(enlist,colsToJoin))))]

// if there are many repeated names, you can use .Q.fu to improve performance:
q) select fullName:.Q.fu[{`$" "sv'string x};flip(firstName;lastName)] from person

// in functional form, with .Q.fu:
q) ?[person;();0b;enlist[`fullName]!enlist(.Q.fu;{`$" "sv'string x};(flip;(enlist,colsToJoin)))]

Saturday, April 02, 2016

kdb+/q - Reading and Writing a CSV File

This post shows how you can load a CSV file into kdb and write a table out from kdb to a CSV file.

Reading a CSV file:

Let's say that you'd like to load a file containing comma-separated values into an in-memory table in kdb. For example, here's my CSV file, which contains country populations (source: Worldometers):

$ head -5 worldPopulation.csv
region,country,population
Africa,Algeria,40375954
Africa,Angola,25830958
Africa,Benin,11166658
Africa,Botswana,2303820

My file has got two string columns and one integer column.

I can load it into kdb using the Zero Colon function: (types; delimiter) 0: filehandle

q) data:("SSI";enlist",") 0: `$"/path/to/worldPopulation.csv"
// "SSI" creates 2 symbol columns and one integer column

// let's look at the column types
q) meta data
c         | t f a
----------| -----
region    | s
country   | s
population| i

// check the data
q) 5#data
region country                  population
------------------------------------------
Africa Algeria                  40375954
Africa Angola                   25830958
Africa Benin                    11166658
Africa Botswana                 2303820
Africa Burkina Faso             18633725

// you can use * if you want string columns, instead of symbol ones
q) data:("**I";enlist",") 0: `$"/path/to/worldPopulation.csv"

q) meta data
c         | t f a
----------| -----
region    | C
country   | C
population| i

q) 5#data
region   country        population
----------------------------------
"Africa" "Algeria"      40375954
"Africa" "Angola"       25830958
"Africa" "Benin"        11166658
"Africa" "Botswana"     2303820
"Africa" "Burkina Faso" 18633725

Writing a CSV file:

To save a table to a CSV file, you can use the command: filehandle 0: delimiter 0: table

// first, let's try printing out the csv data to console
q) "," 0: data
"region,country,population"
"Africa,Algeria,40375954"
"Africa,Angola,25830958"
"Africa,Benin,11166658"
"Africa,Botswana,2303820"

// save it
q)(`$"/tmp/out.csv") 0: "," 0: data
`/tmp/out.csv

Sunday, March 20, 2016

Java 8: Joining a Stream

In a previous post, I wrote about how you can use Guava's Joiner to easily convert an Iterable into a String, so you no longer have to iterate over it and build the String manually.

Java 8 introduces the StringJoiner, which can be used to join a Stream using Collectors.joining. For example:

Arrays.asList("apple", "banana", "cherry")
      .stream()
      .collect(Collectors.joining(", ");

To skip nulls and empty strings, you can filter the stream first:

Arrays.asList("apple", "banana", null, "", "cherry")
      .stream()
      .filter(s -> s != null && !s.isEmpty())
      .collect(Collectors.joining(", ");

I'd still use Guava's Joiner for lists though.

Saturday, January 23, 2016

Ext JS - Changing the "Today" button in a date field

The datefield in Ext JS has a date picker dropdown containing a handy "Today" button, which allows you to quickly select today's date. But what if you're not interested in today's date? This post shows how you can change the text and the handler of the "Today" button in order to select a different date e.g. yesterday.

My jsfiddle for changing the "Today" button is embedded below. After the date field has been rendered, it gets a reference to the Today button via the date field's picker object and then changes its handler to select yesterday's date instead.

Friday, January 01, 2016

fahd.blog in 2015

Happy 2016, everyone!

I'd like to wish everyone a great start to an even greater new year!

In keeping with tradition, here's one last look back at fahd.blog in 2015.

During 2015, I posted 13 new entries on fahd.blog. I am also thrilled that I have more readers from all over the world! Thanks for reading and especially for giving feedback.

Top 5 posts of 2015:

I'm going to be writing a lot more this year, so stay tuned for more great techie tips, tricks and hacks! :)

Related posts: