I know that adding explicit threading capabilities to APL+Win poses some significant challenges. Some such capabilities can now be explicitly handled by the APLNext Supervisor, if the application needs fit well in that environment. Other possibilities include moving to Visual APL where .Net gives you multi-threading options.
But putting aside explicit threading for the moment, what are the possibilities for increasing APL+Win performance on multi-core machines by doing some implicit threading internally. In other words, when faced with a long and divisible task, could APL actually spin up threads on multiple cores and split up the work among them to get the job done in a fraction of the time? This might be likened to what IBM did many years ago with their "Vector Facility" system, but with multiple cores.
I know that there is significant overhead in thread management and inter-thread communications, but for largish tasks would it not be beneficial to pay those costs for an automatic benefit in performance without requiring any APL re-coding? For instance, a large-array inner product could be pretty easily divisible into smaller pieces, though the workspace memory would probably need to be shared to limit data transfer bottlenecks. Or if a user function (or primitive) were called with "Each" on a large array (where the function does not have side effects), would not there be benefits to running each invocation on a separate CPU?
Anyhow, I'm just wondering how much this concept has been explored and whether it would make a valuable addition to our favorite programming environment and let us give our users a free boost in performance on today's common multi-core hardware. It would probably only be used on a limited basis (where benefits could exceed overhead), but I should think it wouldn't be prohibitively difficult to add such support here and there so we can get the most bang for our buck.
What do you think (or what has been determined) about such possibilities?