APL64 Missing Value

This topic is specifically for discussions on the new APL64 Project currently in development. This topic is open for all to browse. However, to post, one must have a registered account on the APLDN forum only available to APL+Win licensee under a current APL+Win Subscription.

Moderators: Tech Support, phpbb_admin

APL64 Missing Value

Postby crc » September 24th, 2018, 2:26 pm

Hello.

It would be nice to have a new variable type 'Missing Value' (NA) as it exists in R. Very useful in statistics and machine learning. Could be useful in APL 32-bit too.

Christian.
crc
 
Posts: 2
Joined: October 15th, 2010, 3:56 am
Location: France

Re: APL64 Missing Value

Postby Davin Church » September 25th, 2018, 2:49 am

I've used missing values in other languages and appreciate their usefulness, even multiple flavors of missing values that are distinct. However, if I've got old code lying around (especially utilities) then I'd have to adjust every function to be able to expect a new kind of "missing value" data type that might get injected into my arguments from outside. I don't know if that's a good trade-off or not, but it's worth consideration.

J (IIRC) also has a data type of "infinity" which could be conceptually useful, but as a new datatype suffers from some of the same potential drawbacks as missing values.
Davin Church
 
Posts: 651
Joined: February 24th, 2007, 1:46 am

Re: APL64 Missing Value

Postby Ajay Askoolum » September 25th, 2018, 1:21 pm

Quick question: If you have two variables each being Missing Value, and compared them: would the result be true or a Missing Value?
Ajay Askoolum
 
Posts: 884
Joined: February 22nd, 2007, 2:16 am
Location: United Kingdom

Re: APL64 Missing Value

Postby Davin Church » September 25th, 2018, 2:01 pm

Ajay Askoolum wrote:Quick question: If you have two variables each being Missing Value, and compared them: would the result be true or a Missing Value?

In some languages that I've encountered before, a missing value IS equal to another missing value and NOT equal any regular value. Of course, if you go by Excel's way of doing things, any comparison with #NA is also #NA.
Davin Church
 
Posts: 651
Joined: February 24th, 2007, 1:46 am

Re: APL64 Missing Value

Postby Ajay Askoolum » September 25th, 2018, 2:07 pm

I think it is logical that a Missing Value is NOT equal to any other value, not even another Missing Value.

However, this begs the following question: how would you remove Missing Value(s) from a variable?

Code: Select all
      myVar~#Missing Value#
Of necessity, this implies that all Missing Values are equal: that does not make sense.
Ajay Askoolum
 
Posts: 884
Joined: February 22nd, 2007, 2:16 am
Location: United Kingdom

Re: APL64 Missing Value

Postby jbrobston » September 25th, 2018, 4:05 pm

One way to deal with this might be to provide a system variable that matches NA, call it, say, ⎕ISNA , so if X is 1 2 3 NA 4 5 then X=⎕ISNA would give 0 0 0 1 0 0. From there, normal APL operations can be used to remove the NA or do whatever else one wants to with it.
jbrobston
 
Posts: 26
Joined: August 11th, 2015, 9:48 am

Re: APL64 Missing Value

Postby Davin Church » September 25th, 2018, 7:24 pm

If you're treating a missing value in the Excel (#NA) way, then in most circumstances you can imbed a ⍬ in your data and treat it the same way. Only special cases where ⍬ is valid data would be a problem in that case.

Personally, I'd vote for it being treated as a special "value", not as a "non-value". I recall that the SAS language even has 27 missing values, labelled "_A" through "_Z" and "__", or something like that. Each missing value was equal to itself and not equal to any other value (including other types of missing). However, if you do a calculation (not a comparison) with one you always get back a missing value as a result, but that's not the same as comparing them with equals/notequals.
Davin Church
 
Posts: 651
Joined: February 24th, 2007, 1:46 am

Re: APL64 Missing Value

Postby jbrobston » September 25th, 2018, 10:06 pm

I was thinking more of "in an R way" where R has a function for assigning a value of "NA" and other functions for handling them.

0 is not adequate--take the mean of a vector with zeros and the zeroes get counted into the average, with NAs they don't have to. Looking for the max and min of a set of values all of which are greater than zero, with NAs you get the max and min, with zeroes you get zero, and so on.
jbrobston
 
Posts: 26
Joined: August 11th, 2015, 9:48 am

Re: APL64 Missing Value

Postby Davin Church » September 25th, 2018, 10:41 pm

jbrobston wrote:0 is not adequate

Notice I said {zilde}, not {zero}.

R is really designed to do ONLY statistical work, whereas APL needs to have all-purpose functionality. Telling APL that missings have to be ignored when summing (for instance) will keep many other good things from being able to happen.
Davin Church
 
Posts: 651
Joined: February 24th, 2007, 1:46 am

Re: APL64 Missing Value

Postby jbrobston » September 26th, 2018, 10:13 am

If ignoring them would "keep many other good things from being able to happen" I presume you envision many other good things that could be done if missings were added to the language. What do you see as being those good things? I don't see how adding a feature and being able to ignore it if present detracts anything from the language in its current form.

And how is zilde usable as a substitute for the NA in R? Summing a vector containing a zilde results in zilde, not a sum. The rank of a vector containing a zilde includes the zilde.

And you might want to read "Programming with Data" before you jump to the conclusion that R was purpose made for statistics.
jbrobston
 
Posts: 26
Joined: August 11th, 2015, 9:48 am

Re: APL64 Missing Value

Postby Davin Church » September 26th, 2018, 11:27 am

jbrobston wrote:If ignoring them would "keep many other good things from being able to happen" I presume you envision many other good things that could be done if missings were added to the language. What do you see as being those good things? I don't see how adding a feature and being able to ignore it if present detracts anything from the language in its current form.

There is a concept that says that "any error behavior can be replaced with functionality without changing the base language", so in that respect adding a missing value can't hurt. But if you're going to support missings you don't want to support one thing well without doing a good job supporting everything. For instance, what would happen if the designers of APL decided that dyadic ⍴ was useful for reshaping character arrays but didn't make it also work just as well on numeric values? Where would we be now?

I think that adding a missing value is fine, in general. I'm just trying to suggest that we think through every possible design situation and make it as flexible and useful as possible. For instance, if we created a missing value I wouldn't want it to JUST be ignored during summing. Maybe I might also want it to be ignored if I catenated it into a series of character vectors. Or if it gets ignored when taking the ⍴ of a vector, what happens if you take the ⍴ of a matrix containing scattered missings? Or what happens if you use ⍴ to reshape a vector or matrix containing missings. You don't want to add a language feature that leaves functional "holes" when you try to extend its behavior.

jbrobston wrote:And how is zilde usable as a substitute for the NA in R? Summing a vector containing a zilde results in zilde, not a sum. The rank of a vector containing a zilde includes the zilde.

I'm not suggesting a direct replacement for it, but a way to simulate the need for it in specific circumstances. For instance, summing a vector with ⍬ could be done by simply removing them with ~⊂⍬ first.

jbrobston wrote:And you might want to read "Programming with Data" before you jump to the conclusion that R was purpose made for statistics.
Very well - I accept the reprimand as I don't know much about R. (After all, APL is often considered to be a math-only language.) But isn't that how it's marketed? And isn't that what prompted it's creation (though I may be off-base here)? And isn't that what you've all been talking about here? And isn't that what it's mostly used for? I'm thinking about it in a practical way rather than trying to describe its functional limitations.
Davin Church
 
Posts: 651
Joined: February 24th, 2007, 1:46 am

Re: APL64 Missing Value

Postby jbrobston » September 26th, 2018, 3:16 pm

Davin Church wrote:
jbrobston wrote:If ignoring them would "keep many other good things from being able to happen" I presume you envision many other good things that could be done if missings were added to the language. What do you see as being those good things? I don't see how adding a feature and being able to ignore it if present detracts anything from the language in its current form.

There is a concept that says that "any error behavior can be replaced with functionality without changing the base language", so in that respect adding a missing value can't hurt. But if you're going to support missings you don't want to support one thing well without doing a good job supporting everything. For instance, what would happen if the designers of APL decided that dyadic ⍴ was useful for reshaping character arrays but didn't make it also work just as well on numeric values? Where would we be now?

I think that adding a missing value is fine, in general. I'm just trying to suggest that we think through every possible design situation and make it as flexible and useful as possible. For instance, if we created a missing value I wouldn't want it to JUST be ignored during summing. Maybe I might also want it to be ignored if I catenated it into a series of character vectors. Or if it gets ignored when taking the ⍴ of a vector, what happens if you take the ⍴ of a matrix containing scattered missings? Or what happens if you use ⍴ to reshape a vector or matrix containing missings. You don't want to add a language feature that leaves functional "holes" when you try to extend its behavior.


I agree that any implementation of missing should be carefully thought out and the way that such are implemented in other languages should be studied.

Davin Church wrote:
jbrobston wrote:And how is zilde usable as a substitute for the NA in R? Summing a vector containing a zilde results in zilde, not a sum. The rank of a vector containing a zilde includes the zilde.

I'm not suggesting a direct replacement for it, but a way to simulate the need for it in specific circumstances. For instance, summing a vector with ⍬ could be done by simply removing them with ~⊂⍬ first.


That's useful syntax and by itself might do the job. Not sure I understand how it works though (I don't mean the effect but how one would think to do that in the first place).

Davin Church wrote:
jbrobston wrote:And you might want to read "Programming with Data" before you jump to the conclusion that R was purpose made for statistics.
Very well - I accept the reprimand as I don't know much about R. (After all, APL is often considered to be a math-only language.) But isn't that how it's marketed? And isn't that what prompted it's creation (though I may be off-base here)? And isn't that what you've all been talking about here? And isn't that what it's mostly used for? I'm thinking about it in a practical way rather than trying to describe its functional limitations.


My impression is that R (when it was designed it was called "S"--"R" is the open source version of S, which is still commercially available) was what Bell Labs came up with when they started thinking about developing an array processing language. It was originally intended to "work with data'" and has become popular with data scientists as well as statisticians, so it has been considerably extended in those directions (it's hard to find an R course that actually teaches the language--most of them that you encounter teach how to use it for data science, which means that the base language gets short shrift). Being open source it's not really "marketed" in any conventional sense.

And I don't think it's necessary to just arbitrarily force the language to ignore NAs. R has functions to return an object including NAs, report an error if the object contains NAs, return it with NAs removed, and return it padded with NAs, and I'm sure I've left some out. So there's a lot of versatility there. It's concievable that this need could be met in APL without adding language features by providing a suitable library.
jbrobston
 
Posts: 26
Joined: August 11th, 2015, 9:48 am

R is not elusive

Postby Ajay Askoolum » September 26th, 2018, 7:51 pm

1. Ways of doing things in R are better documented on the internet than in APL.
2. R annual conferences attract delegates with more letters after their names than you would care to count; typically, the number of delegates number around 500 (more than APL did in its hey day).
3. Like APL, R is interactive and has many similarities with APL.
4. R derives from S but R is also supported commercially.
5. Unlike APL, R is extendable by its user base; there are many packages which overlap in functionality.
6. There have been several initiatives to re-vamp the language but the momentum of the language is such that these have all faltered rather quickly.

The attached is a little dated but it does provide an insight if you want to get to speed with R as a personal computing tool.

PS: There are at least two packages that implement standard APL functionality such as inner/outer product, n-wise reduction etc.
Attachments
APL+Win with R - ClientServer Integration.pdf
(1.71 MiB) Downloaded 331 times
Ajay Askoolum
 
Posts: 884
Joined: February 22nd, 2007, 2:16 am
Location: United Kingdom

Re: APL64 Missing Value

Postby crc » October 11th, 2018, 4:36 pm

Davin Church wrote:
Ajay Askoolum wrote:Quick question: If you have two variables each being Missing Value, and compared them: would the result be true or a Missing Value?

In some languages that I've encountered before, a missing value IS equal to another missing value and NOT equal any regular value. Of course, if you go by Excel's way of doing things, any comparison with #NA is also #NA.


As the missing value is an absorbent element, result shoud be, I think, missing.
crc
 
Posts: 2
Joined: October 15th, 2010, 3:56 am
Location: France


Return to APL64 Project

Who is online

Users browsing this forum: No registered users and 2 guests

cron