Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This package doesn't use the total order predicates #40

Open
andyferris opened this issue Sep 6, 2018 · 14 comments
Open

This package doesn't use the total order predicates #40

andyferris opened this issue Sep 6, 2018 · 14 comments

Comments

@andyferris
Copy link

In AcceleratedArrays.jl I've been playing with "search intervals" for querying data, for example finding all the dates within a given range with the help of a sort-based acceleration index.

I've shied away from using this package because the predicates for total order used by sorting algorithms in Julia are isequal and isless and the accelerations I rely on (e.g. this) may not be valid for < and == comparisons.

I was wondering if there was a reason for chosing these operators, and whether or not we might consider it advantageous to use isless and isequal instead?

(One difficulty I have is dealing with data where some values are missing, where < and == aren't even predicates that return Bool. In this case I still expect findall(in(0..10), array::Array{Union{Int, Missing}}) to work correctly and not crash!)

@timholy
Copy link
Member

timholy commented Sep 15, 2019

You're talking about the implementations of in, presumably?

What you're asking makes quite a bit of sense, but I find myself uneasy about examples like these:

julia> isless(3, NaN)
true

julia> x = 1..NaN
1.0..NaN

And now we vote: what should the answer for the following be?

julia> 2 in x

(Currently it gives false.)

@andyferris
Copy link
Author

Similarly, what is 2 in 1..missing?

@dlfivefifty
Copy link
Member

Shouldn’t that just be missing ?

@andyferris
Copy link
Author

andyferris commented Sep 17, 2019

I kind of think of intervals as non-iterable sets, where in would be a Boolean-valued function and would rely on isequal semantics for implementation like in Set.

@andyferris
Copy link
Author

(If in wasn’t Boolean... then we can’t use if a in b; ...; end and the pun of for a in b; ...; end is a little confusing).

@dlfivefifty
Copy link
Member

I can’t imagine a case where you want that to work when there’s missings flying around....

@andyferris
Copy link
Author

andyferris commented Sep 17, 2019

I can’t imagine a case where you want that to work

Sorry, which to work, exactly?

I think my point of view could be expressed like this: I don't see why the function below should ever throw an exception for iterable iter (that would be terrible for readability and user-friendliness).

function f(iter)
    for x in iter
        if !(x in iter)
            error("WTF!?")
        end
    end
end

f(Set(1,2,3,NaN,missing))
f([1,2,3,NaN,missing])
...

Making the above be sane constrains both the behavior and the return type of in. We really need more Bool predicate functions than ===, <: and isa in the langauge (or else these become the only things that are safe to put into an if predicate in generic code).

@andyferris
Copy link
Author

I find myself uneasy about examples like these

@timholy I also find the example you gave a little unsettling, but it bothers me less than my example function above.

@andyferris
Copy link
Author

I can’t imagine a case where you want that to work when there’s missings flying around....

To give a concrete example, I would really like to do filter(in(1..10), data) where e.g. data is an arary like [0, 2, 5, 11, NaN, missing] and get [2, 5] as the answer. In this case 2 and 5 are the only things that are between 1 and 10 (inclusively) so that's the values filter preserves. (IMO it's quite useless for filter to throw an exception here).

@andyferris
Copy link
Author

andyferris commented Sep 17, 2019

Similarly, what is 2 in 1..missing?

I should have said earlier, the cases that interest me more are

NaN in 1.0..10.0
missing in 1.0..10.0

@dlfivefifty
Copy link
Member

I'm convinced, I agree with your argument (though your original examples 1.0..missing and 1.0..NaN should just error out).

@nalimilan
Copy link

I think the problem is that in has no consistent definition in Base, as it is consistent with ==/> for AbstractArray, but with isequal/isless for Set. It's hard to choose which definition is best for intervals. It would be nice to introduce new functions/methods so that the caller can specify which semantics are desired (waiting for deeper changes in Julia 2.0). I've filed JuliaLang/julia#37157 about this.

For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?

@hyrodium
Copy link
Collaborator

hyrodium commented Apr 2, 2022

For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?

It should be true because -0.0 == 0 is true.

@dlfivefifty
Copy link
Member

Also:

julia> -0.0 in 0.0
true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants