Saturday, October 20, 2012

Joining Two Files with the Unix join Command

The join command is a useful tool for joining two files on a common field. It allows you to join two files, similar to the way you would join two tables in a SQL database.

The following example illustrates the power of the join command. You have two files, one containing a list of employees with their department ids and the other containing departments and their ids. You want to find out the names of the departments for each employee. You MUST first sort the files on the department id column (using the sort command) and then join them on that column.

$ cat employees.txt
Jones,33
Steinberg,33
Robinson,34
Smith,34
Rafferty,31
John,

$ cat departments.txt
31,Sales
33,Engineering
34,Clerical
35,Marketing

$ join -a 1 -t, -1 2 -2 1 -o 1.1 2.2 <(sort -t, -k2 employees.txt) <(sort -t, -k1 departments.txt)
John,
Rafferty,Sales
Jones,Engineering
Steinberg,Engineering
Robinson,Clerical
Smith,Clerical

Joining on multiple columns
The join command joins on a single field. What do you do if you want to join on multiple fields? You create a composite field by combining the multiple fields together! This can be done using awk. For example:
$ cat employees2.txt
Jones,33,50
Steinberg,33,51
Robinson,34,50
Smith,34,50
Rafferty,31,51

$ awk -F, '{print $2"_"$3","$0}' employees2.txt
33_50,Jones,33,50
33_51,Steinberg,33,51
34_50,Robinson,34,50
34_50,Smith,34,50
31_51,Rafferty,31,51

As you can see, an additional field has been created by concatenating the second and third fields of the file. Now you can join the files on the new composite field.

(File data courtesy of Wikipedia.)

Sunday, October 07, 2012

Java: Find an Available Port Number

In some cases, such as in unit tests, you might need to start up a server or an rmiregistry. What port number do you use? You cannot hardcode the port number because when your unit test runs on a continuous build server or on a colleague's machine, it might already be in use. Instead, you need a way to find an available port on the current machine.

According to IANA (Internet Assigned Numbers Authority), the ports that we are free to use lie in the range 1024-49151:

Port numbers are assigned in various ways, based on three ranges: System Ports (0-1023), User Ports (1024-49151), and the Dynamic and/or Private Ports (49152-65535)
The following utility class can help find an available port on your local machine:
import java.io.IOException;
import java.net.DatagramSocket;
import java.net.ServerSocket;
 
/**
 * Finds an available port on localhost.
 */
public class PortFinder {
 
  // the ports below 1024 are system ports
  private static final int MIN_PORT_NUMBER = 1024;
 
  // the ports above 49151 are dynamic and/or private
  private static final int MAX_PORT_NUMBER = 49151;
 
  /**
   * Finds a free port between 
   * {@link #MIN_PORT_NUMBER} and {@link #MAX_PORT_NUMBER}.
   *
   * @return a free port
   * @throw RuntimeException if a port could not be found
   */
  public static int findFreePort() {
    for (int i = MIN_PORT_NUMBER; i <= MAX_PORT_NUMBER; i++) {
      if (available(i)) {
        return i;
      }
    }
    throw new RuntimeException("Could not find an available port between " + 
                               MIN_PORT_NUMBER + " and " + MAX_PORT_NUMBER);
  }
 
  /**
   * Returns true if the specified port is available on this host.
   *
   * @param port the port to check
   * @return true if the port is available, false otherwise
   */
  private static boolean available(final int port) {
    ServerSocket serverSocket = null;
    DatagramSocket dataSocket = null;
    try {
      serverSocket = new ServerSocket(port);
      serverSocket.setReuseAddress(true);
      dataSocket = new DatagramSocket(port);
      dataSocket.setReuseAddress(true);
      return true;
    } catch (final IOException e) {
      return false;
    } finally {
      if (dataSocket != null) {
        dataSocket.close();
      }
      if (serverSocket != null) {
        try {
          serverSocket.close();
        } catch (final IOException e) {
          // can never happen
        }
      }
    }
  }
}