Data Types, Methods, and Introspection¶

(Back to Overview)

Data Types¶

Every data type is a first class citizen. They live in a tree, which can be interrogated using the subtypes command.

Abstract types have subtypes

In [62]:
subtypes(Number)
Out[62]:
2-element Vector{Any}:
 Complex
 Real
In [63]:
subtypes(Real)
Out[63]:
4-element Vector{Any}:
 AbstractFloat
 AbstractIrrational
 Integer
 Rational

Concrete data types don't have subtypes

In [55]:
subtypes(Int64)
Out[55]:
Type[]

Eg. all numeric data types in Julia form this tree: Datatype tree for Julia Number abstract type

While types are not strictly necessary, they are helpful in:

  1. helping the compiler optimize code
  2. provide meaningful error messages

Let's call fib_1 on a string type

In [65]:
fib_1("32.")
MethodError: no method matching isless(::String, ::Int64)
Closest candidates are:
  isless(::AbstractFloat, ::Real) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/operators.jl:186
  isless(::AbstractString, ::AbstractString) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/strings/basic.jl:344
  isless(::Real, ::Real) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/operators.jl:430
  ...

Stacktrace:
 [1] <(x::String, y::Int64)
   @ Base ./operators.jl:352
 [2] top-level scope
   @ In[65]:1
 [3] eval
   @ ./boot.jl:373 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196
In [66]:
function fib_2(n::Number)
    n <= 2 && return 1
    fib_2(n - 1) + fib_2(n - 2)
end
Out[66]:
fib_2 (generic function with 1 method)

Which limits the inputs to numeric types (both Int and Float64 are inherited from the abstract type Number)

In [67]:
fib_2("32.")
MethodError: no method matching fib_2(::String)
Closest candidates are:
  fib_2(::Number) at In[66]:1

Stacktrace:
 [1] top-level scope
   @ In[67]:1
 [2] eval
   @ ./boot.jl:373 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

Methods¶

You should think of functions as ideas. How they are implemented are a functions methods:

Eg: "something that doubles just the part of the number in front of the decimal point". So double_int(10)=20, and double_int(10.1) = 20.1. We can implement this in several ways, eg:

  1. If the input is an integer, double it,
  2. If the input is a floating-point value, then compute the decimal part, double it, and add the original remainder:
In [40]:
function double_int(x::Int)
    return 2*x
end

function double_int(x::AbstractFloat)
    y = floor(Int, x)
    r = x - y
    return 2*y + r
end
Out[40]:
double_int (generic function with 2 methods)
In [30]:
double_int(10)
Out[30]:
20
In [31]:
double_int(10.1)
Out[31]:
20.1

We can list the methods for a function using the methods function:

In [32]:
methods(double_int)
Out[32]:
# 2 methods for generic function double_int:
  • double_int(x::AbstractFloat) in Main at In[29]:5
  • double_int(x::Int64) in Main at In[29]:1

Introspection¶

We may also inspect the details the code using code introspection: https://docs.julialang.org/en/v1/devdocs/reflection/#Reflection-and-introspection

The @code_lowered macro gives is a (still somewhat abstract) idea what Julia actually does.

In [33]:
@code_lowered double_int(2)
Out[33]:
CodeInfo(
1 ─ %1 = 2 * x
└──      return %1
)

This picks up the method for x as an integer, and similarly we can see what Julia does when x is a float:

In [34]:
@code_lowered double_int(2.1)
Out[34]:
CodeInfo(
1 ─      y = Main.floor(Main.Int, x)
│        r = x - y
│   %3 = 2 * y
│   %4 = %3 + r
└──      return %4
)

And @code_llvm shows the llvm IR:

In [41]:
@code_llvm double_int(2)
;  @ In[40]:1 within `double_int`
define i64 @julia_double_int_2028(i64 signext %0) #0 {
top:
;  @ In[40]:2 within `double_int`
; ┌ @ int.jl:88 within `*`
   %1 = shl i64 %0, 1
; └
  ret i64 %1
}

We can see that Julia generates different llvm IR code depending in data types

In [42]:
@code_llvm double_int(2.1)
;  @ In[40]:5 within `double_int`
define double @julia_double_int_2030(double %0) #0 {
top:
  %1 = alloca [3 x {}*], align 8
  %gcframe4 = alloca [3 x {}*], align 16
  %gcframe4.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 0
  %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 0
  %2 = bitcast [3 x {}*]* %gcframe4 to i8*
  call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %2, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #6
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
;  @ In[40]:6 within `double_int`
; ┌ @ float.jl:367 within `floor`
; │┌ @ float.jl:374 within `round`
    %3 = bitcast [3 x {}*]* %gcframe4 to i64*
    store i64 4, i64* %3, align 16
    %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 1
    %5 = bitcast {}** %4 to {}***
    %6 = load {}**, {}*** %pgcstack, align 8
    store {}** %6, {}*** %5, align 8
    %7 = bitcast {}*** %pgcstack to {}***
    store {}** %gcframe4.sub, {}*** %7, align 8
    %8 = call double @llvm.floor.f64(double %0)
; │└
; │┌ @ float.jl:802 within `trunc`
; ││┌ @ float.jl:447 within `<=`
     %9 = fcmp ult double %8, 0xC3E0000000000000
; ││└
    %10 = fcmp uge double %8, 0x43E0000000000000
    %11 = or i1 %9, %10
    br i1 %11, label %L11, label %L9

L9:                                               ; preds = %top
; │└
; │┌ @ float.jl:803 within `trunc`
; ││┌ @ float.jl:312 within `unsafe_trunc`
     %12 = fptosi double %8 to i64
     %13 = freeze i64 %12
; └└└
;  @ In[40]:7 within `double_int`
; ┌ @ promotion.jl:381 within `-`
; │┌ @ promotion.jl:350 within `promote`
; ││┌ @ promotion.jl:327 within `_promote`
; │││┌ @ number.jl:7 within `convert`
; ││││┌ @ float.jl:146 within `Float64`
       %14 = sitofp i64 %13 to double
; │└└└└
; │ @ promotion.jl:381 within `-` @ float.jl:402
   %15 = fsub double %0, %14
; └
;  @ In[40]:8 within `double_int`
; ┌ @ int.jl:88 within `*`
   %16 = shl i64 %13, 1
; └
; ┌ @ promotion.jl:379 within `+`
; │┌ @ promotion.jl:350 within `promote`
; ││┌ @ promotion.jl:327 within `_promote`
; │││┌ @ number.jl:7 within `convert`
; ││││┌ @ float.jl:146 within `Float64`
       %17 = sitofp i64 %16 to double
; │└└└└
; │ @ promotion.jl:379 within `+` @ float.jl:399
   %18 = fadd double %15, %17
   %19 = load {}*, {}** %4, align 8
   %20 = bitcast {}*** %pgcstack to {}**
   store {}* %19, {}** %20, align 8
; └
  ret double %18

L11:                                              ; preds = %top
;  @ In[40]:6 within `double_int`
; ┌ @ float.jl:367 within `floor`
; │┌ @ float.jl:805 within `trunc`
    %ptls_field5 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2305843009213693954
    %21 = bitcast {}*** %ptls_field5 to i8**
    %ptls_load67 = load i8*, i8** %21, align 8
    %22 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load67, i32 1392, i32 16) #7
    %23 = bitcast {}* %22 to i64*
    %24 = getelementptr inbounds i64, i64* %23, i64 -1
    store atomic i64 140257867580688, i64* %24 unordered, align 8
    %25 = bitcast {}* %22 to double*
    store double %8, double* %25, align 8
    %26 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 2
    store {}* %22, {}** %26, align 16
    store {}* inttoptr (i64 140258087882008 to {}*), {}** %.sub, align 8
    %27 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 1
    store {}* inttoptr (i64 140257865666368 to {}*), {}** %27, align 8
    %28 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 2
    store {}* %22, {}** %28, align 8
    %29 = call nonnull {}* @jl_apply_generic({}* inttoptr (i64 140257915457760 to {}*), {}** nonnull %.sub, i32 3)
    call void @jl_throw({}* %29)
    unreachable
; └└
}

Julia does compile different machine code for different input types. For more information go to: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Integers-and-Floating-Point-Numbers and https://docs.julialang.org/en/v1/manual/types/

Performance Benchmarking and Type Stability¶

Here is the reason why it's always good to specity data types: whenever a data type "morphs" into another (for example integer division), you have to do a lot of work, in order to accommodate type instability. It boils down to having to treat otherwise simple variables as more complex objects.

For example:

In [44]:
function t1(n)
    s = 1
    for i in 1:n
        s /= rand()  ## WARNING: unstable type!
    end
    s
end
Out[44]:
t1 (generic function with 1 method)
In [45]:
function t2(n)
    s = 1.      ## Stable type
    for i in 1:n
        s /= rand()
    end
    s
end
Out[45]:
t2 (generic function with 1 method)

The function t1 can't decide ahead of time if s can remain as an integer!

Let's see how this can effect runtime:

In [46]:
using BenchmarkTools
In [47]:
@benchmark t1(10)
Out[47]:
BenchmarkTools.Trial: 10000 samples with 989 evaluations.
 Range (min … max):  45.681 ns … 98.027 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.075 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.836 ns ±  3.737 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█   ▂                                                      ▁
  ██▄▄▇█▇▆▅▅▅▄▅▆▃▅▄▅▄▆▄▄▄▃▄▃▄▄▅▄▅▃▄▃▄▄▃▁▄▃▅▄▅▄▅▄▅▅▆▆▆▇██▇▆▇▆▆ █
  45.7 ns      Histogram: log(frequency) by time      67.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
In [48]:
@benchmark t2(10)
Out[48]:
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  22.531 ns … 41.823 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.626 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.768 ns ±  1.208 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆██▇▆              ▂▁▁                                      ▂
  █████▇▃▁▁▁▁▁▁▁▁▁▁▁████▇▃▁▁▁▁▁▁▅▁▄▅▄▅▄▄▃▄▃▁▄▄▅▄▁▄▃▃▃▁▄▁▄▁▁▁▃ █
  22.5 ns      Histogram: log(frequency) by time        25 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The @code_warntype macro is able to show us how stable data types are

In [ ]:
@code_warntype t1(10)
MethodInstance for t1(::Int64)
  from t1(n) in Main at In[44]:1
Arguments
  #self#::Core.Const(t1)
  n::Int64
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  s::Union{Float64, Int64}
  i::Int64
Body::Union{Float64, Int64}
1 ─       (s = 1)
│   %2  = (1:n)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = s::Union{Float64, Int64}
│   %11 = Main.rand()::Float64
│         (s = %10 / %11)
│         (@_3 = Base.iterate(%2, %9))
│   %14 = (@_3 === nothing)::Bool
│   %15 = Base.not_int(%14)::Bool
└──       goto #4 if not %15
3 ─       goto #2
4 ┄       return s

The Union{Float64, Int64} data type is a red flag: at this point in the code, we might need to convert between Float64 and Int64.

In [ ]:
@code_warntype t2(10)
MethodInstance for t2(::Int64)
  from t2(n) in Main at In[45]:1
Arguments
  #self#::Core.Const(t2)
  n::Int64
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  s::Float64
  i::Int64
Body::Float64
1 ─       (s = 1.0)
│   %2  = (1:n)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = s::Float64
│   %11 = Main.rand()::Float64
│         (s = %10 / %11)
│         (@_3 = Base.iterate(%2, %9))
│   %14 = (@_3 === nothing)::Bool
│   %15 = Base.not_int(%14)::Bool
└──       goto #4 if not %15
3 ─       goto #2
4 ┄       return s

The function t2 is type stable => no variables change between data type as the function runs.