Exciting news from go-linq!

Ahmet Alp Balkan
, on

I have some exciting news about go-linq, my first Go project ever: We have two new and excited maintainers. They added a number of really cool features to the project over the past few months and we finally have some stable releases. I can’t wait more to talk about all these

What is go-linq?

go-linq is an anti-pattern library to address a lack of language-integrated collection querying and comprehension functionality in Go. I initially wrote it to see if it would be actually nice to have.

It roughly works by giving it an input slice/map/chan or another source, and chaining a bunch of methods after it.

                                   +----------+
                                   |          |
                            +------v-------+  |
     +---------------+      | .Where(...)  |  |  +--------------------+
     | slice         |      |              +--+  |.ToSlice(&v)        |
     |               |      | .Select(...) |     |                    |
     | map           |      |              +----->.ToMap(&v)          |
From(|               |)+----> .OrderBy(...)|     |                    |
     | chan      NEW!|      |              |     |.ToChannel(c)   NEW!|
     |               |      | .GroupBy(...)|     +--------------------+
     | custom    NEW!|      |              |
     +---------------+      | .Join(...)   |
                            |              |
                            | ...          |
                            +--------------+

Basically, if you are tired of writing a for-loop in every project to test whether a slice contains a particular value you can simply type:

From(slice).Contains(v)

Here is a more complicated example:

func ownersOfNewCars(cars []Car) []string {
    var owners []string

    From(cars).Where(func(c interface{}) bool {
        return c.(Car).year >= 2015
    }).Select(func(c interface{}) interface{} {
        return c.(Car).owner
    }).ToSlice(&owners)

    return owners
}

Looks a bit ugly, as I said, almost an anti-pattern, but we will make it prettier in a minute.

What was wrong with go-linq?

go-linq did what was written on the box, but with certain gotchas:

Improvements

1. Memory & Performance

In one of those warm Seattle mornings, I got this email:

Somebody who I never talked to before told me that he rewrote go-linq and he is seeing crazy performance improvements. And by improvement, I mean 100,000 times faster in certain cases and with a few order of magnitude less memory footprint and allocations. Crazy!

Turns out by implementing the iterator pattern from the .NET’s LINQ, we are now passing the data down as it is computed, therefore only a certain amount is stored in the memory at any time while the computation pipeline is executed. This dramatically improved the performance and changed the project architecture, which eventually enabled other features (more about this in a minute).

We worked together on his huge pull request that made go-linq v2.0 happen! So, I am happy to welcome Alexander Kalankhodzhaev (@kalaninja) as a new maintainer to go-linq.

2. Parametric Functions Support

If you look at the example above, you will see bunch of empty interface{}s, type assertions (c.(Car).year). Apparently we could do better than that.

A while ago, somebody suggested implementing support for parametric functions. I couldn’t really understand how they would even work and we closed the proposal since implementing them requires reflection and therefore the resulting code would be slow.

Few months later, another fellow programmer showed up with yet another huge pull request. He had the same proposal, but he showed up with the code (lots of it, took us 2 months to discuss the design and review):

With some reflection tricks, go-linq now lets you write code that is free of the empty interface{} and type assertions. We accept can predicate functions that contain your own types directly:

This is extremely cool; however, it does not come for free: If you use parametric functions, your code runs 5-10x slower. We think that this is useful for many scenarios that do not have huge datasets or performance concerns, such as not using go-linq in hot code paths.

Eventually, we landed this change in go-linq v3.0 and I am glad to announce that Cleiton Marques (@cleitonmarx) is a new maintainer on go-linq.

3. Channels Support

Prior to go-linq v2.0, if you were to process a large amount of data coming from a stream, your best option was to buffer it all into memory first, then pass to go-linq for processing.

Similarly, you had no option to take data out of your go-linq processing chain in chunks. The only option was to wait for all calculation to be completed and then get all results at once.

Applying the iterator pattern and lazy evaluation enabled us to add methods such as FromChannel(ch) (From(ch) also works) and ToChannel(ch) to consume and produce data as they become available. You can now process data coming from a stream using go-linq with minimal memory footprint.

4. Custom Collections/Types Support

Prior to go-linq v2.0, we only supported Go builtin types for collections, namely slices and maps. However today, anything that implements the Iterable interface can be passed to From(...) function.

Similarly, prior to go-linq v2.0, we had no way of supporting custom types in methods that compare and sort elements (such as OrderBy). With v2.0, any type that implements the Comparable interface can be sorted.

5. Better Documentation

It is now much easier to learn go-linq thanks to all the examples we added to the package documentation. We also had to re-think about how a newcomer would decide whether to use parametric functions or not, purely by following the documentation. Let us know if you hit any walls while trying it out.


If you liked this post, you can follow me on Twitter or subscribe by email to my blog (no more than an article/month).