This post is about how to impose order on instances of Java classes. The context of the post will be centred around reading in some text and then counting the number of times each word appears in the text. The output will then be sorted either alphabetically or in order of the number of times each word appears etc. Links to the source code are provided if you want to take a look.
Some instances of classes have an implicit order – a natural ordering. For example, in Java, String
objects are ordered lexicographically; Integer
objects are ordered numerically. If you find you are writing a value class whose instances have an obvious natural order, you should consider implementing the Comparable
interface:
public interface Comparable {
int compareTo(T t);
}
The String
class is an example of a class that implements Comparable
; instances of it are ordered lexicographically. If you choose to implement Comparable
, your class will be able to take advantage of many of the generic algorithms and collections available in the Java APIs, assuming you need to of course! The following example reads in some text and counts the instances of each word:
public class WordCountTest {
public static void main(String[] args) throws Exception {
WordReader reader = new WordReader(new InputStreamReader(System.in));
Map wordMap = new TreeMap();
String word;
while ((word=reader.readWord()) != null) {
Integer count = wordMap.containsKey(word) ? wordMap.get(word) : 0;
wordMap.put(word, ++count);
}
// Print out using the natural order of the key
for (Map.Entry entry : wordMap.entrySet()) {
System.out.println(entry.getKey()+" : "+entry.getValue());
}
}
}
A TreeMap
sorts entries based on the natural order of the key, in this case, a string, but what if you want to order your instances in an unnatural order? e.g. ordering integers in decreasing order from largest to smallest. Well, in Java, you would use a Comparator
:
public interface Comparator {
int compare(T o1, T o2);
boolean equals(Object obj);
}
Comparators also allow you to provide an ordering for objects that don’t have a natural ordering, e.g. classes that don’t implement Comparable
. Let’s take a look at an example that uses a comparator – as defined in the Order class – that sorts the words based on the number of times they appear:
public class WordCountTest2 {
public static void main(String[] args) throws Exception {
WordReader reader = new WordReader(new InputStreamReader(System.in));
Map wordMap = new HashMap();
String word;
while ((word=reader.readWord()) != null) {
Integer count = wordMap.containsKey(word) ? wordMap.get(word) : 0;
wordMap.put(word, ++count);
}
// Sort by greatest occurrence first
Set entries = new TreeSet(Order.INCREASING_COUNT_COMPARATOR);
for (Map.Entry entry : wordMap.entrySet()) {
entries.add(new WordCount(entry.getKey(), entry.getValue()));
}
// Print
Iterator result = entries.iterator();
while (result.hasNext()) {
WordCount e = result.next();
System.out.println(e.getWord()+" "+e.getCount());
}
}
}
It’s also quite common to instantiate a Comparator
as an anonymous class and pass it in to the constructor of a sorted collection.
It’s worth noting that in the compare method in the Comparator
used in the example, we first compare the word counts and if they are equal, we then return the result of comparing the value of the words. This is essential because sorted collections in Java use the compareTo
method – or in this case, the compare
method of the Comparator
– in place of equals
. What this means is that if we were to compare just the counts and return zero because they are equal – even though the words may be different – the WordCount instance would NOT get added to the (sorted) collection if an instance already exists in the collection with the same count. There’s a whole section in Joshua Bloch’s book – Effective Java (2nd edition) – that discusses this in greater detail (Item 12) for those that are interested.
That’s it really. There’s not much to it but it is worth taking the time to understand the difference(s) between implementing Comparable
and creating a custom Comparator
. Happy sorting!
By the way, if anybody has any better/alternative ideas for how to implement this, let me know. For example, my initial implementation used a TreeMap
but that only allows you to sort on the key and not on the value(s) so wanting to sort in order of greatest number of occurrences won’t work with this particular data structure. In the second example, after constructing the hash table, I then add each entry to a (sorted) set then print out each value. Is there a way of doing this and avoiding the second step? I can’t think of one but maybe I’m missing something 🙂