Every data type is a first class citizen. They live in a tree, which can be interrogated using the subtypes
command.
Abstract types have subtypes
subtypes(Number)
2-element Vector{Any}: Complex Real
subtypes(Real)
4-element Vector{Any}: AbstractFloat AbstractIrrational Integer Rational
Concrete data types don't have subtypes
subtypes(Int64)
Type[]
Eg. all numeric data types in Julia form this tree:
While types are not strictly necessary, they are helpful in:
Let's call fib_1
on a string type
fib_1("32.")
MethodError: no method matching isless(::String, ::Int64) Closest candidates are: isless(::AbstractFloat, ::Real) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/operators.jl:186 isless(::AbstractString, ::AbstractString) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/strings/basic.jl:344 isless(::Real, ::Real) at /home/linuxbrew/.linuxbrew/Cellar/julia/1.7.2/share/julia/base/operators.jl:430 ... Stacktrace: [1] <(x::String, y::Int64) @ Base ./operators.jl:352 [2] top-level scope @ In[65]:1 [3] eval @ ./boot.jl:373 [inlined] [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1196
function fib_2(n::Number)
n <= 2 && return 1
fib_2(n - 1) + fib_2(n - 2)
end
fib_2 (generic function with 1 method)
Which limits the inputs to numeric types (both Int
and Float64
are inherited from the abstract type Number
)
fib_2("32.")
MethodError: no method matching fib_2(::String)
Closest candidates are:
fib_2(::Number) at In[66]:1
Stacktrace:
[1] top-level scope
@ In[67]:1
[2] eval
@ ./boot.jl:373 [inlined]
[3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
You should think of functions as ideas. How they are implemented are a functions methods:
Eg: "something that doubles just the part of the number in front of the decimal point". So double_int(10)=20
, and double_int(10.1) = 20.1
. We can implement this in several ways, eg:
function double_int(x::Int)
return 2*x
end
function double_int(x::AbstractFloat)
y = floor(Int, x)
r = x - y
return 2*y + r
end
double_int (generic function with 2 methods)
double_int(10)
20
double_int(10.1)
20.1
We can list the methods for a function using the methods
function:
methods(double_int)
We may also inspect the details the code using code introspection: https://docs.julialang.org/en/v1/devdocs/reflection/#Reflection-and-introspection
The @code_lowered
macro gives is a (still somewhat abstract) idea what Julia actually does.
@code_lowered double_int(2)
CodeInfo( 1 ─ %1 = 2 * x └── return %1 )
This picks up the method for x
as an integer, and similarly we can see what Julia does when x
is a float:
@code_lowered double_int(2.1)
CodeInfo( 1 ─ y = Main.floor(Main.Int, x) │ r = x - y │ %3 = 2 * y │ %4 = %3 + r └── return %4 )
And @code_llvm
shows the llvm IR:
@code_llvm double_int(2)
; @ In[40]:1 within `double_int` define i64 @julia_double_int_2028(i64 signext %0) #0 { top: ; @ In[40]:2 within `double_int` ; ┌ @ int.jl:88 within `*` %1 = shl i64 %0, 1 ; └ ret i64 %1 }
We can see that Julia generates different llvm IR code depending in data types
@code_llvm double_int(2.1)
; @ In[40]:5 within `double_int` define double @julia_double_int_2030(double %0) #0 { top: %1 = alloca [3 x {}*], align 8 %gcframe4 = alloca [3 x {}*], align 16 %gcframe4.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 0 %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 0 %2 = bitcast [3 x {}*]* %gcframe4 to i8* call void @llvm.memset.p0i8.i32(i8* noundef nonnull align 16 dereferenceable(24) %2, i8 0, i32 24, i1 false) %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #6 %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8 %ppgcstack = bitcast i8* %ppgcstack_i8 to {}**** %pgcstack = load {}***, {}**** %ppgcstack, align 8 ; @ In[40]:6 within `double_int` ; ┌ @ float.jl:367 within `floor` ; │┌ @ float.jl:374 within `round` %3 = bitcast [3 x {}*]* %gcframe4 to i64* store i64 4, i64* %3, align 16 %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 1 %5 = bitcast {}** %4 to {}*** %6 = load {}**, {}*** %pgcstack, align 8 store {}** %6, {}*** %5, align 8 %7 = bitcast {}*** %pgcstack to {}*** store {}** %gcframe4.sub, {}*** %7, align 8 %8 = call double @llvm.floor.f64(double %0) ; │└ ; │┌ @ float.jl:802 within `trunc` ; ││┌ @ float.jl:447 within `<=` %9 = fcmp ult double %8, 0xC3E0000000000000 ; ││└ %10 = fcmp uge double %8, 0x43E0000000000000 %11 = or i1 %9, %10 br i1 %11, label %L11, label %L9 L9: ; preds = %top ; │└ ; │┌ @ float.jl:803 within `trunc` ; ││┌ @ float.jl:312 within `unsafe_trunc` %12 = fptosi double %8 to i64 %13 = freeze i64 %12 ; └└└ ; @ In[40]:7 within `double_int` ; ┌ @ promotion.jl:381 within `-` ; │┌ @ promotion.jl:350 within `promote` ; ││┌ @ promotion.jl:327 within `_promote` ; │││┌ @ number.jl:7 within `convert` ; ││││┌ @ float.jl:146 within `Float64` %14 = sitofp i64 %13 to double ; │└└└└ ; │ @ promotion.jl:381 within `-` @ float.jl:402 %15 = fsub double %0, %14 ; └ ; @ In[40]:8 within `double_int` ; ┌ @ int.jl:88 within `*` %16 = shl i64 %13, 1 ; └ ; ┌ @ promotion.jl:379 within `+` ; │┌ @ promotion.jl:350 within `promote` ; ││┌ @ promotion.jl:327 within `_promote` ; │││┌ @ number.jl:7 within `convert` ; ││││┌ @ float.jl:146 within `Float64` %17 = sitofp i64 %16 to double ; │└└└└ ; │ @ promotion.jl:379 within `+` @ float.jl:399 %18 = fadd double %15, %17 %19 = load {}*, {}** %4, align 8 %20 = bitcast {}*** %pgcstack to {}** store {}* %19, {}** %20, align 8 ; └ ret double %18 L11: ; preds = %top ; @ In[40]:6 within `double_int` ; ┌ @ float.jl:367 within `floor` ; │┌ @ float.jl:805 within `trunc` %ptls_field5 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2305843009213693954 %21 = bitcast {}*** %ptls_field5 to i8** %ptls_load67 = load i8*, i8** %21, align 8 %22 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load67, i32 1392, i32 16) #7 %23 = bitcast {}* %22 to i64* %24 = getelementptr inbounds i64, i64* %23, i64 -1 store atomic i64 140257867580688, i64* %24 unordered, align 8 %25 = bitcast {}* %22 to double* store double %8, double* %25, align 8 %26 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe4, i64 0, i64 2 store {}* %22, {}** %26, align 16 store {}* inttoptr (i64 140258087882008 to {}*), {}** %.sub, align 8 %27 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 1 store {}* inttoptr (i64 140257865666368 to {}*), {}** %27, align 8 %28 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 2 store {}* %22, {}** %28, align 8 %29 = call nonnull {}* @jl_apply_generic({}* inttoptr (i64 140257915457760 to {}*), {}** nonnull %.sub, i32 3) call void @jl_throw({}* %29) unreachable ; └└ }
Julia does compile different machine code for different input types. For more information go to: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Integers-and-Floating-Point-Numbers and https://docs.julialang.org/en/v1/manual/types/
Here is the reason why it's always good to specity data types: whenever a data type "morphs" into another (for example integer division), you have to do a lot of work, in order to accommodate type instability. It boils down to having to treat otherwise simple variables as more complex objects.
For example:
function t1(n)
s = 1
for i in 1:n
s /= rand() ## WARNING: unstable type!
end
s
end
t1 (generic function with 1 method)
function t2(n)
s = 1. ## Stable type
for i in 1:n
s /= rand()
end
s
end
t2 (generic function with 1 method)
The function t1
can't decide ahead of time if s
can remain as an integer!
Let's see how this can effect runtime:
using BenchmarkTools
@benchmark t1(10)
BenchmarkTools.Trial: 10000 samples with 989 evaluations. Range (min … max): 45.681 ns … 98.027 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 46.075 ns ┊ GC (median): 0.00% Time (mean ± σ): 46.836 ns ± 3.737 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▇█ ▂ ▁ ██▄▄▇█▇▆▅▅▅▄▅▆▃▅▄▅▄▆▄▄▄▃▄▃▄▄▅▄▅▃▄▃▄▄▃▁▄▃▅▄▅▄▅▄▅▅▆▆▆▇██▇▆▇▆▆ █ 45.7 ns Histogram: log(frequency) by time 67.1 ns < Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark t2(10)
BenchmarkTools.Trial: 10000 samples with 996 evaluations. Range (min … max): 22.531 ns … 41.823 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 22.626 ns ┊ GC (median): 0.00% Time (mean ± σ): 22.768 ns ± 1.208 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▆██▇▆ ▂▁▁ ▂ █████▇▃▁▁▁▁▁▁▁▁▁▁▁████▇▃▁▁▁▁▁▁▅▁▄▅▄▅▄▄▃▄▃▁▄▄▅▄▁▄▃▃▃▁▄▁▄▁▁▁▃ █ 22.5 ns Histogram: log(frequency) by time 25 ns < Memory estimate: 0 bytes, allocs estimate: 0.
The @code_warntype
macro is able to show us how stable data types are
@code_warntype t1(10)
MethodInstance for t1(::Int64) from t1(n) in Main at In[44]:1 Arguments #self#::Core.Const(t1) n::Int64 Locals @_3::Union{Nothing, Tuple{Int64, Int64}} s::Union{Float64, Int64} i::Int64 Body::Union{Float64, Int64} 1 ─ (s = 1) │ %2 = (1:n)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]) │ (@_3 = Base.iterate(%2)) │ %4 = (@_3 === nothing)::Bool │ %5 = Base.not_int(%4)::Bool └── goto #4 if not %5 2 ┄ %7 = @_3::Tuple{Int64, Int64} │ (i = Core.getfield(%7, 1)) │ %9 = Core.getfield(%7, 2)::Int64 │ %10 = s::Union{Float64, Int64} │ %11 = Main.rand()::Float64 │ (s = %10 / %11) │ (@_3 = Base.iterate(%2, %9)) │ %14 = (@_3 === nothing)::Bool │ %15 = Base.not_int(%14)::Bool └── goto #4 if not %15 3 ─ goto #2 4 ┄ return s
The Union{Float64, Int64}
data type is a red flag: at this point in the code, we might need to convert between Float64
and Int64
.
@code_warntype t2(10)
MethodInstance for t2(::Int64) from t2(n) in Main at In[45]:1 Arguments #self#::Core.Const(t2) n::Int64 Locals @_3::Union{Nothing, Tuple{Int64, Int64}} s::Float64 i::Int64 Body::Float64 1 ─ (s = 1.0) │ %2 = (1:n)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64]) │ (@_3 = Base.iterate(%2)) │ %4 = (@_3 === nothing)::Bool │ %5 = Base.not_int(%4)::Bool └── goto #4 if not %5 2 ┄ %7 = @_3::Tuple{Int64, Int64} │ (i = Core.getfield(%7, 1)) │ %9 = Core.getfield(%7, 2)::Int64 │ %10 = s::Float64 │ %11 = Main.rand()::Float64 │ (s = %10 / %11) │ (@_3 = Base.iterate(%2, %9)) │ %14 = (@_3 === nothing)::Bool │ %15 = Base.not_int(%14)::Bool └── goto #4 if not %15 3 ─ goto #2 4 ┄ return s
The function t2
is type stable => no variables change between data type as the function runs.