Cilk Syntax Study

Note: Previous post on this topic:

I’m thinking more about how one can use TBB to write task code that looks similar to existing C code. Of course, people have tried to do this before, and made languages that integrate task parallelism naturally. This article takes a look at these existing solutions, looking for inspiration.


Probably the most well-known task-based programming language is Cilk

Here’s an example Cilk procedure (from the paper above):

thread fib (cont int k, int n)
  if (n < 2)
    cont int x, y;
    spawn_next sum(k, ?x, ?y);
    spawn fib (x, n-1);
    spawn fib (y, n-2);

thread sum (cont int k, int x, int y)
  send_argument (k, x+y);

There’s a certain amount of syntactical sugar here:

  • functions that act as tasks have a “thread” qualifier
  • a “spawn” keyword differentiates spawning child tasks from calling functions
  • a “spawn_next” keyword spawns a continuation task (to spawn more tasks until results arrive)
  • “cont”-qualified variables allow passing data from predecessor to successor task.
  • a built-in “send_argument” sets “cont” variables, and spawns tasks with fully satisfied arguments.
  • a built-in “?” operator allows declaring the dependency of a successor on a predecessor.

This is some pretty cute syntax. My main worry is that there might be overhead in the automation of passing around continuation variables. In contrast, TBB also allows creating continuation tasks, but it requires you to pass continuation arguments by reference manually. For example, TBB users can create a continuation task with inputs as member variables, and the address of these member variables are used as destination addresses for the childrens’ computation. See Continuation Passing. Still, the TBB continuation syntax is pretty tedious to type (and probably error-prone), and I wonder if we can do some C++ magic to simplify it.

The “spawn” and “spawn_next” syntax makes spawning tasks look a lot like calling functions, which is consistent with the goals I described in the previous post. The “cont” variables might be possible to implement by wrapping them in a C++ type, which could implement operator= (or a similar function) for the purpose of implementing an equivalent to “send_argument”. Cilk allows cont variables to be passed as arguments that are declared non-cont (such as sum’s x/y above), and automatically unwraps them when the time comes to actually call the function. In a C++ implementation, this automatic unwrapping might be possible to implement with a variadic function template that unwraps its arguments before passing them to the desired function. If that’s too difficult, we can fall back to defining the continuation function with explicit “std::future”-like arguments, requiring using a special function to unwrap them at the usage site.

I think one of the best things about Cilk is implicitly generating dependencies between tasks by passing arguments. This is much less work and is more maintainable than explicitly declaring dependencies. It does not deal with running two tasks in an order based on side-effects, like if you want printf() calls in two tasks to always happen in the same order. This might be possible to mitigate your in design by factoring out side-effects. Alternatively, we could create a useless empty struct and use that to indicate dependencies while reusing the syntax used to pass meaningful data. This is very similar to the tbb::flow::continue_msg object used in TBB flow graph.

By the way, Cilk’s dependencies are implemented by keeping a counter of missing arguments for tasks. When the counter reaches 0, the task can be executed. This is very similar to how TBB tasks implement child-parent dependencies. The awkwardness is that TBB normally only supports a single parent task, so a task with multiple parents need to be handled specially. See General Acyclic Graphs of Tasks.

Cilk Plus

Cilk Plus is an extension of C/C++ available in some compilers. It enables features similar to Cilk in a way that interops with C/C++. However, instead of any continuation passing, it defines a keyword “cilk_sync”, which waits for all child tasks to finish executing before proceeding. This is probably perfect for fork-join parallelism (a tree of tasks), but I’m not sure if it’s possible to implement a general directed acyclic graph with these features.


The ISPC language is mainly useful for writing high-performance SIMD code, but it also defines some built-in syntax for task parallelism. Namely, it supports built-in “task”, “launch”, and “sync” keywords. Again, this seems limited only to fork-join parallelism.


I’ve seen a few other languages with task-parallelism, but they usually seem to stop at fork/join parallelism, without talking about how continuations or DAGs might be implemented. If you know about a programming language interface that improves on the syntax of Cilk for creating continuations, please tell me about it.


I like Cilk’s ideas for passing data from child task to parent task. Implementing an interface similar to it in C++ using TBB might allow a pretty natural way of implementing task parallelism both for fork/join task trees or more general task DAGs. My main concern is making an interface that makes it easy to do common tasks.

I think that continuation passing might be an elegant way to implement sequential-looking code that actually executes in a DAG-like fashion, which would make it easy to reuse the average programmer’s intuition of single-threaded programming. I want the DAG dependencies to be built naturally and implicitly, similar to how Cilk implements “cont” variables. I want to make it easy to create placeholder “cont” variables that are used only to build dependencies between tasks with side-effects that need to operate in a specific order, similarly to tbb::flow::continue_msg. I also want a way to have a node with multiple parents (to implement general DAGs), and I’d like to minimize the overhead of doing that.

One of my main concerns is how to encapsulate the reference count of tasks. TBB sample programs (in the TBB documentation) all work with reference counts in a pretty low-level way, which may be suitable for when you want to carefully accelerate a specific algorithm, but seems error-prone for code that evolves over time. I hope that this logic can be encapsulated in objects similar to Cilk’s “closure” objects. I think these closure objects could be implemented by creating a derived class from tbb::task, and some C++ syntactical sugar (and maybe macros) could be used to simplify common operations of the task system. From there, I’m worried about the potential overhead of these closures. How can they be allocated efficiently? Will they have a lot of atomic counter overhead? Will their syntax be weird? I’ll have to do some experimentation.


One comment

  1. Pingback: TBB Task DAG with Deferred Successors | nlguillemot

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s