Play Time 3

February 15, 2017

The third major version of my music statistics app Play Time is now available on the App Store. Play Time allows you to find out which songs you’ve spent the most time listening to, and allows you to analyse your music listening habits over time. The new version of the app allows you to do more specific breakdowns for your music library, find out which songs you’ve spent the most time listening to this week, and view more statistics about your library as a whole.

Previous versions of the app only showed songs sorted by the time you’d spent listening to them, i.e. the play time of the song, but it is now possible to sort by other statistics, such as the play count or duration of the song. Like the last version of the app this extends to artists and albums, and Play Time 3 adds the option to analyse genres too.

Since version 2 Play Time has supported viewing when you’ve spent time listening to your music, but it was harder to view what you were listening to. Play Time 3 adds a “top this week” feature and can optionally remind you to check out what you’ve been listening to once a week.

One of the more whimsical features of the app is the set of bonus facts and statistics it can provide about your music library. The most recent addition is comparing the height of your music library as CDs with other objects and buildings.

You can download Play Time on the App Store for $0.99.

♦

Printable diffs

February 3, 2017

At Oxford we submit almost all of our practical assignments on paper, which usually means printing out the diff of the work that we’ve done. Generally most people just generate the diff with git and send it to the printer with lpr, but I’m yet to succeed at getting output that I like with this approach.

Therefore I am now generating the diff, saving it to a file, and then using the LaTeX listings package to produce line-wrapped, coloured output.

Firstly, I generate the diff file itself:

git diff ORIGINAL_COMMIT_HASH -- > all.diff

Then I generate a PDF via xelatex diff.tex from the following LaTeX file:

\documentclass[11pt,]{article}
\usepackage[margin=1in]{geometry}
\usepackage{listings}
\usepackage{parskip}
\usepackage[rgb,dvipsnames]{xcolor}

\definecolor{mygreen}{rgb}{0.1,0.25,0.01}
\definecolor{myred}{rgb}{0.25,0.1,0.01}
\definecolor{mymagenta}{rgb}{0.5,0.05,0.25}

\lstdefinelanguage{diff}{
  morecomment=[f][\color{blue}]{@@},       % group identifier
  morecomment=[f][\color{myred}]-,         % deleted lines
  morecomment=[f][\color{mygreen}]+,       % added lines
  morecomment=[f][\color{mymagenta}]{---}, % Diff header lines (must appear after +,-)
  morecomment=[f][\color{mymagenta}]{+++},
}

\lstset{
    basicstyle=\ttfamily,
    numberstyle=\footnotesize,
    stepnumber=1,
    numbersep=5pt,
    backgroundcolor=\color[RGB]{255,255,255},
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2,
    captionpos=b,
    breaklines=true,
    breakatwhitespace=true,
    breakautoindent=true,
    escapeinside={\%*}{*)},
    linewidth=\textwidth,
    basewidth=0.5em,
}

\begin{document}

\lstinputlisting{all.diff}

\end{document}

♦

Typing

January 7, 2017

This is the second of a two part series on writing. The first examined how and what I write by hand, and in this second article I discuss the software and hardware that I use when typing.

For several years I used Apple desktop keyboards, and these are what I learned to touch type on. With time, however, I began to find the flat surface and shallow keys less comfortable, so I began to look for something more ergonomic. I settled on the Microsoft Sculpt keyboard, which is the successor to Natural Ergonomic 4000. This keyboard is a much more comfortable shape for resting your hands, although it takes some getting used to, has slightly deeper keys, and is split down the middle, forcing you to use the correct fingers when touch typing.

I did consider using a mechanical keyboard, however I have never particularly liked the sound that they make, and I am yet to invest the time in determining which key switches I prefer. I have no doubt that in the future I’ll go down this path, but for now I’m happy with my Microsoft keyboard.

The Apple Smart Keyboard deserves an honourable mention here. I don’t use it all the time, but it’s surprisingly good when I do. I expected that I would dislike the small, shallow keys but I’ve found that it is possible to touch type on at a comfortable pace close to what I could do on a full sized desktop keyboard.

Since 2015 my primary text editor has been Vim (although I use NeoVim these days). Before that I’d used TextMate for years, and despite my determination not to catch the ‘Vim bug’ I went ahead, dived in, and now I’m trapped! It now seems slightly odd to use a text editor that doesn’t have separate normal, visual, and insert modes, and I’ve found that Vim’s shortcuts significantly improve my productivity without being awkward or uncomfortable. My use of Vim has increased pretty much linearly with my use of terminals, and these days you can pretty much guarantee that I’ve got a tmux session with Vim running somewhere.

Aside from code, when I edit documents in Vim they tend to be either Markdown or LaTeX, with the vast majority in Markdown. Both are great formats for my typeset lecture notes or problem sheets, because I can use pandoc and xelatex to generate PDFs that contain plain text, syntax highlighted code, equations, and diagrams (which pretty much covers everything I need to do). I haven’t used a regular document editor (like Word or Pages) for several years because I find Markdown a great deal more convenient.

When I’m editing a document or note that will not need to be typeset, but I still want to use Markdown, I tend to use Ulysses as it syncs documents across all my devices, looks great, and is fast even with large libraries. I’ve replaced my use of Evernote entirely with Ulysses, after a long series of strange UI and policy decisions at Evernote. My most recent blog posts were all written in Ulysses, for example. The Apple Notes app, after adding support for rich text, lists, and images certainly deserves a mention, but I don’t use it as much for longer notes.

I am also totally dependent on TextExpander for text expansion on my computer. I’m currently using an older version of the app that predates their move to subscription pricing, and I expect I’ll continue using it until a future macOS update breaks it, but hopefully that will be in the distant future.

On my phone I use SwiftKey rather than the standard iOS keyboard. This comes with the occasional disadvantage that it is slightly slower to show up, but the benefit of swiping to type is a big win. I’ve experimented with other similar keyboards by Google and Microsoft, but I have found that SwiftKey is the most reliable over time.

♦

Writing

January 7, 2017

Surprisingly, for a computer science student, I evenly split my time between typing and writing by hand. I generally begin by writing out my notes and problem sheets by hand before later preparing a typeset version of them, depending on what is required. In this first of a two part series I discuss the pens, inks, paper, and notebooks that I use, and why I often choose to write by hand.

For most of the last decade I’ve exclusively used fountain pens, with occasional forays into ball points, roller balls, and fine liners. Fountain pens have the significant advantage that ink will flow out of them with very little pressure, whereas a ball point requires you to apply pressure directly to the page, which is significantly less comfortable. The section (the part of the pen you grip) is also far nicer to hold on most fountain pens, firstly because it is often far wider than that of a ball point, thus keeping reasonable separation between your fingers, but also because many pens will also be shaped for holding (see the pictures of Lamy pens below) comfortably.

Unusually for a fountain pen user, I rarely write in cursive and do not italicise my letters. The main reason for this is that I have to write a lot of mathematical notation, and I’ve found it helps to be consistent between text and equations. I do, however, enjoy calligraphy but don’t do it for regular note taking.

Parker Sonnet, Lamy Safari, and Pilot Metropolitan

At the moment I am currently switching between Parker, Lamy, and Pilot pens. Parker pens are widely available in the UK, and I’ve found them to write very smoothly. My Parker pens all have medium nibs, which forces me to write slightly larger to ensure proper separation between characters. More recently I’ve come to enjoy using Lamy pens, which are often inexpensive, with fine nibs. I particularly like my Lamy demonstrator pen, which has a clear case, allowing you to see the internal working of the pen.

In nearly all my pens I use converters, rather than ink cartridges, as this allows me to use a far broader variety of inks. Until very recently I exclusively used Parker Blue-Black, but I’ve come to find the colour a little too grey. Instead, I have started using J. Herbin 1670 inks, with Emerald of Chivor and Ocean Blue pictured below:

Emerald of Chivor with fine nibbed Lamy Safari, Ocean Blue with medium nibbed Parker Sonnet

To say these inks are fantastic is an understatement. The ink flows perfectly in all my fountain pens, and both have a deep, rich colour. The emerald ink contains small flecks of gold, so the writing will sparkle in some lighting conditions. These inks are very expensive compared to most, but they are by far the best I’ve ever used.

From personal experience I must also recommend Amodex Ink Stain Remover, especially if you use thicker, oil based inks. Aside from a dip pen that I use infrequently for calligraphy most of my pens do not have a tendency to leak, but when they do this stuff is a life saver.

Whilst pens and ink are important, the quality of the paper I write on is equally important to me. When I was at school I tended to use Ryman Superior paper, which is thick enough to avoid bleeding with most inks. The paper is, however, quite coarse so I’ve therefore switched to using Rhodia pads instead. I started with a dot pad, which is great for diagrams and testing pens or inks, but I then began using the larger lined pads. I recommend Rhodia Nº 19 in white with lined pages. The pages of this pad do not come hole punched, and have a margin that is slightly too wide for my liking (although they’re good for taking notes) but the paper has a much smoother finish than the Ryman paper. I’ve found that when writing on the back of sheets in Rhodia pads (which fold at the top, with a perforated edge for each page) it is generally best to either only fold over the top few sheets onto the back of the pad, or tear it off immediately.

As well as paper pads, I also write a great deal in notebooks. I currently use Moleskine and Leuchtturm notebooks, and I far prefer the latter. My Moleskine notebooks are smaller, so I tend to use them for todo lists and for jotting ideas down, whereas I tend to take longer notes in the larger (A4) Leuchtturm notebook. The paper is also more translucent in Moleskine notebooks, meaning that you get slightly more bleeding. The Leuchtturm notebook also has page numbers and a table of contents, which make it ideal for university notes (although, sadly, they don’t come cheap). As all these notebooks are narrow ruled I’ve preferred to write with a fine nib and also use a darker ink colour, as the pages are not pure white. Fine liners also work well in these notebooks, but with ball points you generally have to apply so much pressure that you’ll mark the page below. Furthermore, I wouldn’t recommend using Lamy roller balls or the Moleskine pen in these notebooks, as the former is much to thick and the latter smudges too easily.

Whilst much slower than typing, I enjoy writing by hand because it gives me a lot more creative freedom over the appearance of my writing, is more ergonomic than typing, and the slower pace tends to force me to think a great deal more about what I’m writing.

Overall, I am very happy with my current writing setup, and I look forward to evolving it in the future. In the second of this two part series I will discuss the software and hardware that I use for typing.

♦

Oxford term dates

January 7, 2017

When I come home for vacations from Oxford I suddenly find myself using real dates again, rather than thinking in terms of “Tuesday of 5th week” or “Sunday of 3rd week”. However, during term time there is the occasional need to map between ‘Oxford dates’ and real dates.

Wolfson College provide a calendar that you can subscribe to that provides each week of each term, which is invaluable when scheduling lectures and tutorials into my calendar. Sometimes, however, I just need to very quickly find out what date an ‘Oxford date’ falls on, or what ‘Oxford date’ a regular date falls on.

I have therefore created a very simple Python script that provides this functionality, allowing you to run ./oxtermdate.py 2017-02-14 or ./oxtermdate.py Tu5 to map between the two. Currently you have to set an environment variable for the start of term (which you can find here), but seen as this only changes three times a year I am currently not bothered about automating it.

♦

Comparing the implementation of generics

November 13, 2016

Introduction

Programmers typically learn to abstract over values with looping, recursion, and function parameters, before moving onto abstracting over functions and types [1]. The facility to abstract over types in a programming language significantly reduces the need to rewrite code for data structures and algorithms that deal with multiple types. In statically typed programming languages all types must be concrete at the time of execution, whereas generics represent abstract types that can be compiled to concrete types by substituting “type parameters” with concrete types.

C++, Java, and C# (.NET) are the most commonly used object-oriented languages in industry at the time of writing [2]. In most cases, C++ is compiled to native machine code before execution, Java is compiled to a bytecode which is then compiled to native machine code by a virtual machine, and .NET commonly uses both strategies [3–5]. Each of the languages uses static typing, which therefore requires all types to be concrete and known at compile time.

For the remainder of this discussion, types will be used to collectively refer to both value types, which describe the bit layout of data structures, and reference types, which are pointers to values [4]. In C++, Java, and .NET a reference is word-sized pointer, so a generic type or function can be compiled to be compatible with any reference type by ignoring the type, and just treating it as a raw pointer. This strategy, called type erasure, is adopted by both .NET and Java [6].

Value types, however, can vary in size and it is therefore not possible to use a single implementation of a generic class or function efficiently (in a statically typed language). At compilation time (whether ahead-of-time or just-in-time) a common approach is to instantiate non-generic classes with their generic parameters replaced with the identities of concrete types. Both C++ and .NET allow user-defined value types, but the JVM currently only supports user-defined reference types. Java therefore uses boxing to wrap value types in lightweight objects in order to use them with generics [8].

C++

C++ implements generic classes, methods, and functions through templates, which cause the compiler to compile separate copies of the code for each unique set of type arguments [9]. Consider a simple generic class Box:

template<typename T>
class Box {
public:
    Box(T value) { _value = value; }
private:
    T _value;
};

Later, when a function declares an instance of Box it will be required to supply the name of a type for the type parameter T:

Box<int> intBox(42);
Box<std::string> stringBox("Hello, world!");

For each unique set of type arguments used per compilation unit, which is generally a single C++ file, instances are created by replacing T with the appropriate type and compiling the generated code. In this case, copies of Box will be generated with T replaced by int and std::string. The generated code is then type checked, to ensure that all used methods, fields, and operators are accessible. For example, the Standard Template Library function std::sort is generic across all types, but will only compile when used with a type that overloads the < operator [10]. As described later, C++ doesn’t enforce requirements such as these in the signatures of generic methods (but still enforces them at compile time).

A key advantage of C++’s approach is that there is no runtime performance cost compared to handwritten code, although it does come at the cost of increased code size and compilation time [11–12]. As each template is instantiated for each code unit that uses it, there can be a multiple copies of the same instantiation per executable. For example, Box<int> could be used by a.cpp and b.cpp, and therefore the constructor for Box<int> would appear in the object files for both a.cpp and b.cpp. If the two object files are linked into the same executable this is unnecessary, and one copy can be eliminated. It is the responsibility of the linker to eliminate the equal instantiations from the final executable [13]. An alternative technique, which doesn’t rely on linker optimisations, is to explicitly instantiate certain instances to ensure that equal instances are only compiled once.

C++ further allows programmers to provide alternative specializations of generic classes, for types where it may be more efficient to do so. For example, the Standard Template Library specializes std::vector<bool> to represent boolean values with a single bit, rather than as a single byte [10–14]. C++ templates also allow specialization based on values, so the type std::array<int, 7> is distinct from the type std::array<int, 42> [15].

Java

The initial release of Java in 1995 did not include generics. The Generic Java (GJ) project [16], started in 1998, was integrated into the language in 2004 [17]. As a result of the relatively late inclusion of generics, the authors of Generic Java were conscious that it would be necessary to maintain compatibility for existing code that used non-generic interfaces for classes, such as common collections, that could now be implemented using generics.

Prior to the introduction of generics, collections APIs used Object, Java’s root reference type, as their element type [18]. The Generic Java project therefore chose to ‘erase’ the generic type parameters by replacing them with Object in order to compile the generic class, and introduce the appropriate casts in client code. The following generic code:

public class Box<T> {
    public T value;
    public Box(T value) {
        this.value = value;
    }
}
//Client code
Box<Integer> intBox = new Box<Integer>(42);
Integer intFromBox = intBox.value;

Is therefore compiled to:

public class Box {
    public Object value;
    public Box(Object value) {
        this.value = value;
    }
}
//Client code
Box intBox = new Box(new Integer(42));
Integer intFromBox = (Integer)intBox.value;

A consequence of type erasure is that it restricts the type parameters to reference types, and as such it is not possible to create an instance of Box<int>, for example [6]. In order to get around this, Java provides boxed classes which contain a field for Java’s eight primitive value types [8]. In Oracle’s HotSpot JVM, this comes with the disadvantage that a Java object contains a header of one or two words, and to access any field at least one pointer traversal is required [19]. There is therefore the initial cost of allocating memory on the heap for boxed instance, the memory cost of the object’s header, the cost of traversing at least one pointer to get the boxed value, and the cost of garbage collecting the instance. In many cases it is difficult to efficiently use generics with primitive types.

Project Valhalla [20], a recent effort to “incubate advanced Java VM and language features”, supports the introduction of value types and generic specialisation to the language in order to make these cases more efficient.

Furthermore, without reflection it is not possible to create new instances of the generic argument either, i.e. object creation via new T() is not possible.

.NET

Most C++ compilers compile to native code “ahead of time” (AOT), whereas Java byte-code is compiled “just in time” (JIT) by a virtual machine, but .NET software can be compiled using either method. Typically, however, execution is via the JIT-compiled Common Language Runtime. At runtime, generic types are instantiated as they are required, so generics can be used with value and reference types. A single instance is shared for all reference types - effectively using type erasure like Java - but separate instances are created for value types. Rather than completely erasing types at compile time, the metadata section of .NET “portable executables” encodes the generic parameters of a type or method, allowing these to be replaced, and the method can then be JIT compiled [4]. After an instance has been compiled it will be reused, and the generated native code remains in memory whilst the application is executed.

For most types found in .NET applications (references), this approach will offer similar performance to Java, as appropriate casts between the base reference type and the expected reference type must be inserted. Meanwhile, the .NET approach offers a significant advantage for value types as a separate instantiation is natively compiled for each value type used by the program, which avoids the need for boxing or unboxing, hence offering better performance [21].

Despite using type erasure for reference types, .NET allows new T() in generic methods [4] by automatically using reflection methods, provided that a given type has the default constructor available [23].

As well as providing a JIT-environment for .NET applications, Microsoft provides a tool that compiles .NET bytecode assemblies to native code [24]. This improves performance of .NET applications by removing the need for JIT compilation, and also reduces memory requirements.

Unlike native C++, .NET executables typically consist of multiple assemblies that are dynamically linked at runtime (either by the JIT or OS services). Suppose an assembly A.dll exposes the generic type List<T>, which is instantiated as List<string> and List<int> in both B.dll and C.dll, and that D.exe links to both B.dll and C.dll.

In the JIT environment the bytecode for List<T> will only be included in A.dll, but the native environment presents a greater challenge. The first option would be to include native code for List<string> and List<int> in the natively compiled B.dll and C.dll, but this violates .NET’s principal that equal types should be identical [4].

In order to counteract this problem, Kennedy and Syme [21–25] introduce further rules for the “home” of a particular instantiation:

The assembly that defines a generic type includes the instantiation for all reference types, like Java. In the case described above A.dll would include List<ref>
If an assembly needs a value type instantiation, then include it in the assembly. B.dll and C.dll would therefore require their own implementations of List<int>, but they could both use the implementation of List<ref> in A.dll for List<string>, as string is a reference type in .NET
At link time do not link to a generic type if an equal instantiation has already been linked to. The implementation of List<int> in C.dll would therefore be ignored if B.dll had already been linked in D.exe, thus resolving the identity issue

Type system features

Some generic algorithms or data structures have to be restricted to certain concrete types. For example, a sorting algorithm only works with types that are comparable so that a partial order can be formed. Both .NET and Java allow programmers to restrict generic types to concrete types that implement certain interfaces, similar to type classes in Haskell [26–28]. This restriction is applied in the type signature of the generic class or method.

C++ doesn’t have interfaces, and therefore the generic code is only type-checked when it is instantiated with a concrete type. Bjarne Strousoup, the creator of C++, considers this a disadvantage of C++’s template system, as it requires programmers to program against the implementation of a generic class, rather than the interface [29]. As such, there is an effort to introduce “concepts” to C++ to add these restrictions before the template is instantiated with a concrete type [30].

All of the languages described support subtype polymorphism, however their treatment of subtyping of generic types varies. For example, in OCaml if Cat were a subtype of Animal, then “list of Cat” is a subtype of “List of Animal” [31]. There are three possible behaviours for the subtype relation:

Invariant: There is no subtype relationship between different instantiations of a generic class, irrespective of the subtype relationship between the type arguments. This is the default approach used by all three programming languages [4–32–33]
Covariant: “list of Cat” is a subtype of “list of Animal”, so the order of types is preserved. This approach can be used in .NET [4]
Contravariant: The subtyping order is reversed. Again, .NET supports additional annotations to generics to support contravariance

Conclusion

Generics are a useful addition to many programming languages, and there are further efforts to add higher-order “kinds” (a generalisation of parametrized types) to functional programming languages such as Haskell [34]. The compilation techniques described above are used by main other programming languages, although recent languages such as Go remain resistant to their addition, citing increased compilation time and executable size as drawbacks [35].

The native code generated by compilers for statically typed languages is unlikely to change due to the stability of computer architectures, OS calling conventions, and memory layouts. Recent research and development into generics instead focusses on adding restrictions to the types that can be used with generics to increase type safety and programmer productivity whilst reducing the complexity of the compiler.

The appendix contains implementations of a simple generic Box class, with examples of the code generated by common compilers for each of the three platforms.

Appendix

C++

template<typename T>
class Box
{
    public:
        Box(T value) { _value = value; }
        T _value;
};

int main() {
    Box<int> box(42);
    return 0;
}

Before compiling to native code, the LLVM compiler generates an intermediate representation (similar to the byte code representation used by Java and .NET). By compiling similar C and referring to the documentation [36], the output has been labelled:

; Metadata for the size and layout of the type
%class.Box = type { i32 }

define i32 @main() #0 {
  ; alloca ensures there is enough stack space
  ; and returns the address of the instance
  %1 = alloca %class.Box, align 4
  ; Call the constructor
  call void @_ZN3BoxIiEC2Ei(%class.Box* %1,
    i32 42)
  ret i32 0
}

; Constructor of Box<int>
; linkonce_odr allows inlining and prevents
; duplication of this generic instance
define linkonce_odr void @_ZN3BoxIiEC2Ei
  (%class.Box*,  i32) unnamed_addr #2 align 2 {
  ; getelementptr returns the address of a field
  %3 = getelementptr inbounds %class.Box,
    %class.Box* %0, i32 0, i32 0
  ; Copies the int into the class' _value field
  store i32 %1, i32* %3, align 4
  ret void
}

The above shows that a single instance of Box is instantiated with T replaced by int, and no other instances are compiled.

Java

//Box.java
public class Box<T> {
    public T mValue;
    public Box(T value) {
        this.mValue = value;
    }
}
//App.java
public class App {
    public static void main(String[] args) {
        Box<Integer> boxInteger = new Box<>(42);
    }
}

Java is compiled to a bytecode, but it is possible to use the javap tool to output a textual representation of the bytecode [37]. Some irrelevant information has been removed from the output for brevity’s sake.

//Box.class disassembly
public class Box<T extends java.lang.Object>
  extends java.lang.Object
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#17
   #2 = Fieldref           #3.#18
   #3 = Class              #19
   #4 = Class              #20
   #5 = Utf8               mValue
   #6 = Utf8               Ljava/lang/Object;
   ...
   #8 = Utf8               TT;
   #9 = Utf8               <init>
   ...
   #13 = Utf8              (TT;)V
   #14 = Utf8
     <T:Ljava/lang/Object;>Ljava/lang/Object;
   ...
   #17 = NameAndType       #9:#21
   #18 = NameAndType       #5:#6
   #19 = Utf8              Box
   #20 = Utf8              java/lang/Object
   #21 = Utf8              ()V
{
  //Note that the type descriptor is just
  //Object as the type is erased
  public T mValue;
    descriptor: Ljava/lang/Object;
    flags: ACC_PUBLIC
    Signature: #8

  public Box(T);
    descriptor: (Ljava/lang/Object;)V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=2, args_size=2
         //Object init()
         0: aload_0
         1: invokespecial #1
         //Sets mValue
         4: aload_0
         5: aload_1
         6: putfield      #2
         9: return
    Signature: #13
}
Signature: #14

//App.class disassembly
public class App
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#15
   #2 = Class              #16
   #3 = Methodref          #17.#18
   #4 = Methodref          #2.#19
   #5 = Class              #20
   #6 = Class              #21
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   ...
   #15 = NameAndType       #7:#8
   #16 = Utf8              Box
   #17 = Class             #22
   #18 = NameAndType       #23:#24
   #19 = NameAndType       #7:#25
   #20 = Utf8              App
   #21 = Utf8              java/lang/Object
   #22 = Utf8              java/lang/Integer
   #23 = Utf8              valueOf
   #24 = Utf8              (I)Ljava/lang/Integer;
   #25 = Utf8              (Ljava/lang/Object;)V
{
  public App();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1
         4: return

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=3, locals=2, args_size=1
         0: new           #2
         3: dup
         4: bipush        42
         6: invokestatic  #3
         9: invokespecial #4
        12: astore_1
        13: return
}

.NET

namespace GenericApp
{
    public class Box<T>
    {
        public T Value;

        public Box(T val)
        {
            this.Value = val;
        }
    }

    public class Program
    {
        public static void Main
            (string[] args)
        {
            var box = new Box<int>(42);
        }
    }
}

The above C# compiles to the Common Intermediate Language, which is typically represented in a byte-code format. The below is an alternative text representation, created by the Mono Project’s disassembler [38] and labelled based on the runtime specification [4]. Note that some irrelevant sections of the output have been removed.

.namespace GenericApp
{
  //`1 indicates that the class has only 1
  //generic parameter
  .class public auto ansi beforefieldinit
    Box`1<T>
    extends [mscorlib]System.Object
  {
    //!0 refers to the first parameter
    .field  public  !0 Value

    .method public hidebysig specialname
           instance default void '.ctor' (!T val)
           cil managed
    {
      .maxstack 8
      //Constructs the instance and sets
      //this.Value (field 0)
      ldarg.0
      call instance void object::'.ctor'()
      ldarg.0
      ldarg.1
      stfld !0 class GenericApp.Box`1<!0>::Value
      ret
    }
  }
}

.namespace GenericApp
{
  .class public auto ansi beforefieldinit
    Program extends [mscorlib]System.Object
  {
    .method public static hidebysig
           default void Main (string[] args)
           cil managed
    {
      .entrypoint
      .maxstack 1
      //Encodes the local type
      .locals init (
        class GenericApp.Box`1<int32> V_0)
      //HEX 2a = DEC 42
      ldc.i4.s 0x2a
      newobj instance void class
        GenericApp.Box`1<int32>::'.ctor'(!0)
      stloc.0
      ret
    }
  }
}

References

1. J. Gibbons, Datatype-Generic Programmming. (2007).

2. TIOBE, TIOBE Index. (2016).

3. International Organization for Standardization (ISO), International Standard ISO/IEC 14882:2014(E) - Programming Language C++. (2014).

4. Microsoft, ECMA-335: Common Language Infrastructure. (2012).

5. T. Lindholm, F. Yellin, G. Bracha, & A. Buckley, The Java Virtual Machine Specification, Java SE 8 Edition. (2015).

6. Oracle, Type Erasure. (2015).

7. Microsoft, Generics in the Run TIme (C# Programming Guide). (2015).

8. Oracle, Autoboxing and Unboxing. (2015).

9. B. Strousoup, The C++ Programming Language 4th Edition. (2013).

10. A. Stepanov, C++ Standard Template Library. (1994).

11. J. Coffin, Do C++ templates make programs slow? (2010).

12. J. Matthews, What’s Wrong with C++ Templates? (2003).

13. GCC Manual, Template Instantiation. (2016).

14. H. Hinnant, On vector. (2012).

15. cplusplus.com, array - C++ Reference. (2016).

16. G. Bracha, M. Odersky, D. Stoutamire, & P. Wadler, Making the future safe for the past: Adding Genericity to the Java Programming Language. (1998).

17. A. Buckley, S. Marx, & O. Martin, JSR 14: Add Generic Types To The Java Programming Language. (2004).

18. Sun Microsystems, Inc, Class Vector (JDK 1.0). (1996).

19. J. Rose, B. Goetz, & G. Steele, State of the Values. (2014).

20. Oracle, OpenJDK: Valhalla. (2016).

21. A. Kennedy & D. Syme, Pre-compilation for .NET Generics. (2005).

22. B. Venners, B. Eckel, & A. Hejlsberg, A Conversation with Anders Hejlsberg: Generics in C#, Java, and C++. (2004).

23. Microsoft, new Constraint (C# Reference). (2015).

24. Microsoft,.Ngen.exe (Native Image Generator). (2015).

25. A. Kennedy & D. Syme, Combining Generics, Pre-compilation and Sharing Between Software-Based Processs. (2005).

26. P. Hudak, J. Peterson, & J. Fasel, A Gentle Introduction to Haskell, Version 98. (2000).

27. J. Lowy, An Introduction to C# Generics. (2005).

28. Oracle, Bounded Type Parameters. (2015).

29. B. Strousoup, A bit of background for concepts and C++17. (2016).

30. International Organization for Standardization (ISO), International Standard ISO/IEC TS 19217:2015 - C++ Extensions for concepts. (2015).

31. Y. Minsky, A. Madhavapeddy, & J. Hickey, Real World OCaml: Functional programming for the masses. (2013).

32. S. Tambe, Covariance and Contravariance in C++ Standard Library. (2015).

33. Oracle, Wildcards and Subtyping. (2015).

34. Glasgow Haskell Compiler, Kinds. (2014).

35. A. Gerrand, golang proposal: generic programming facilities. (2016).

36. LLVM, LLVM Language Reference Manual. (2016).

37. Oracle, javap. (2016).

38. Mono Project, Dis/Assembling CIL Code. (2016).

♦

Automating the Wikipedia game

October 20, 2016

In the Wikipedia game, you typically click the random article button and aim for another Wikipedia article by clicking on the fewest links possible. At a recent Oxford CompSoc event we wrote bots to automate this process.

Before the event I wrote up a few Python scripts to play the game by using the MediaWiki API. There is an API call to fetch a list of links for a given page, but it unfortunately lists them alphabetically rather than in the order they appear. I therefore used the general API for fetching the MediaWiki markup for the page and scraped the links instead. This meant I could use a simple strategy of clicking the first link to get to Philosophy, and then using a known route from there to our goal page. Effectively this just performed a depth first search on the graph of links.

Most people chose to use and adapt this strategy, often by prioritising certain frequently linked to articles and using known routes from those articles, although at least a couple of people experimented with backlinks.

Whilst my “philosophy” strategy tended to reach the goal article within around forty links, the best strategy used a breadth-first search (capped at the first five links of an article) with certain well known routes used to reach the goal at the end. This was still a largely uninformed search, but by “knowing” the category of a Wikipedia article you could almost certainly reduce the number of clicks that you need.

Everyone stuck to search strategies and the Wikipedia API, rather than using the brute force approach of downloading a dump of all Wikipedia’s text, and running Dijkstra’s on it locally.

♦

Minor site update

August 31, 2015

Over the past few days I’ve been migrating from Squarespace to a self-hosted site. If I’ve done this correctly, you should only notice a few minor visual changes.

The server now runs on DigitalOcean, with a NodeJS backend written in TypeScript. I’d hoped to avoid writing any server side code, either by generating a static site with Jekyll or by using a CMS, but Squarespace has irregular URLs. For example, dates can be written with any number of digits. I didn’t fancy learning whichever CMS happens to be cool today, so writing a little NodeJS wasn’t onerous.

Writing in TypeScript and using Visual Studio Code made my life a lot easier. They work great together, especially with IntelliSense and debugging. I would recommend doing the following:

If you’re using any NPM modules, make sure you have DefinitelyTyped definitions
If there isn’t a DefinitelyTyped definition, make sure you import via let module = require('module') rather than import module = require('module') because otherwise TypeScript will throw an error
Use tsc --watch to auto-recompile files

If you notice any broken URLs, please contact me so that I can fix them.

♦

Keep Calm 4 for Android

March 3, 2015

I’ve just shipped a major update to Keep Calm for Android.

Photo backgrounds
Change the foreground colour
Material Design

Aside from new features, I’m now also including ads again - photo backgrounds and text colours had previously been ‘pro’ features before I removed Keep Calm Pro from Google Play.

♦

Play Time 2

February 17, 2015

Play Time 2, my music statistics app for iOS, has just received a major update. You can now view how you listen to your music over time, view interactive graphs for your listening habits of individual songs, albums, and artists, and view more statistics than ever before. Play Time 2 is available for $0.99 from the App Store today.

Play Time, Over Time

Each time you launch Play Time it records the total play count of every song in your music library. As these change you’ll be able to see how the total play time of any song, album, or artist, or your whole library, change over time. The graph itself is interactive and you can view the total play time at any point in time.

New statistics

Previously you could only view a handful of statistics about your music library, however you can now view average play time, duration, play count, title length, and rating as well as word counts in song titles. Songs, albums, and artists also show fun facts related to their play time. For example, I’ve spent about the same amount of time listening to Daft Punk as the gestation period of a mouse.

Interactive charts

There are now interactive charts that allow you to view:

Rating vs play time (if you rate your music you’ll probably find that yours looks very similar to everyone elses!)
Year vs play time
Genre vs play time

Other new features

Support for new iPhones and the iPad with settings that sync via iCloud
Share any chart or statistic by tapping on it
CSV data export (works with Numbers/Excel/Google Drive on your iOS device or on your computer) in case you want to do even more analysis
Gravitational ratings (you’ll get it when you see it)
Searching
Song/album/artist rankings
Completely redesigned UI with a new icon
The app is so much faster, its not even funny

You can download Play Time from the App Store for $0.99.

♦

← Newer

Older →