Hooray!! This is wonderful! We really need something like this, even for improving responsiveness in business applications. I also think this will do great things for APL as a language, including competing better against other languages that can easily make use of multiple cores! Adding support for more primitive functions would be welcome as well.
However, I would also like to see this applied to operators -- especially Each. (Outer and Inner Products could be handled similarly to Each.) Sure, I sometimes have large vectors that can nicely benefit from multi-core processing. But WAY more often I need to call a small subroutine (self-contained with no side effects) on each element of a large array. This will benefit processing speed (in my applications, where I use them extensively) more than all the single primitive threadings put together.
Granted, multi-threading user-functions with side effects would be a significant problem, but I can think of a couple of potential ways around that. One way is to detect if the function can possibly produce any side effects (writing to globals, using Execute, or calling unsafe subroutines/APIs would be in that category) and only multi-thread it if it obviously had no way to produce side effects. Another possibility is to implement an operator such as APL2's "PEACH" (parallel-each) operator, to be used explicitly (in place of Each) to tell the interpreter that you WANT it split up onto multiple cores (and that you've determined that it's acceptable to do so).
In addition, if we can get in-line expression-functions or composed-functions, how about being able to multi-thread those with Each (and similar operators) as well? They shouldn't be any problem if we can already do limited user-defined functions and wouldn't have many of the complications found with user-defined functions.
So what might be the possibilities of multi-threading some operators here?