Array-Based Data Types¶

(Back to Overview)

Arrays, and Matrices¶

Physicists like lists and arrays of numbers:

1-D list of numbers:

In [1]:
v = Vector{Float64}(undef, 3)
Out[1]:
3-element Vector{Float64}:
 0.0
 0.0
 2.6769032496e-314
In [2]:
fill!(v, 0)
Out[2]:
3-element Vector{Float64}:
 0.0
 0.0
 0.0

Note the convention here: functions that mutate their arguments should end in a ! (this is to warn the user, and "only" a convention ... the compiler won't complain, but the user of your library might).

Arrays can be multi-dimensional:

In [3]:
m = Matrix{Float64}(undef, 2, 2)
Out[3]:
2×2 Matrix{Float64}:
 0.0           2.31605e-314
 2.31609e-314  0.0
In [4]:
fill!(m, 1)
Out[4]:
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0

The Matrix type is a 2D "array", but we can specify any number of dimensions -- 3D for example:

In [5]:
a = Array{Float64, 3}(undef, 2, 5, 2)
Out[5]:
2×5×2 Array{Float64, 3}:
[:, :, 1] =
 2.317e-314  2.317e-314  2.23241e-314  2.23241e-314  2.23241e-314
 2.317e-314  2.317e-314  2.317e-314    2.317e-314    2.317e-314

[:, :, 2] =
 2.23241e-314  2.23241e-314  2.23241e-314  2.23241e-314  2.23241e-314
 2.317e-314    2.317e-314    2.317e-314    2.23241e-314  2.23241e-314

One-liner to get a matrix of zeros (use ones for ones)

In [7]:
zeros((2,3))
Out[7]:
2×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0

Indices start at 1 (And why shouldn't they?!)¶

In [8]:
v = [1, 2, 3]
Out[8]:
3-element Vector{Int64}:
 1
 2
 3
In [9]:
v[1]
Out[9]:
1

So doing things c-style will get you an error:

In [10]:
v[0]
BoundsError: attempt to access 3-element Vector{Int64} at index [0]

Stacktrace:
 [1] getindex(A::Vector{Int64}, i1::Int64)
   @ Base ./array.jl:861
 [2] top-level scope
   @ In[10]:1
 [3] eval
   @ ./boot.jl:373 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

The end shortcut gives you the last index in an array type

In [11]:
v[end]
Out[11]:
3

Working with ND Arrays¶

The size method returns an order tuple of each directions size

In [12]:
size(a)
Out[12]:
(2, 5, 2)

Loops in brackets [ ... ] get stored as ND arrays

In [13]:
m = [(i, j) for i=1:3, j=10:14]
Out[13]:
3×5 Matrix{Tuple{Int64, Int64}}:
 (1, 10)  (1, 11)  (1, 12)  (1, 13)  (1, 14)
 (2, 10)  (2, 11)  (2, 12)  (2, 13)  (2, 14)
 (3, 10)  (3, 11)  (3, 12)  (3, 13)  (3, 14)

A word of caution: arrays are stored in column-major format in memory. That means that the first indices are contigous blocks in memory (just like Fortran, and unlike C/Python). If you itterate over an array in Julia, make sure that the inner loop itterates over the first index. This is something to keep in mind when itterating over several variables in a loop.

In [14]:
for i in CartesianIndices(m)
    println(i)
end
CartesianIndex(1, 1)
CartesianIndex(2, 1)
CartesianIndex(3, 1)
CartesianIndex(1, 2)
CartesianIndex(2, 2)
CartesianIndex(3, 2)
CartesianIndex(1, 3)
CartesianIndex(2, 3)
CartesianIndex(3, 3)
CartesianIndex(1, 4)
CartesianIndex(2, 4)
CartesianIndex(3, 4)
CartesianIndex(1, 5)
CartesianIndex(2, 5)
CartesianIndex(3, 5)

We can interrogate the memory layout of a multi-dimensional array by using the vec method (a.k.a [:]) to slice it in 1-D:

In [15]:
m[:]
Out[15]:
15-element Vector{Tuple{Int64, Int64}}:
 (1, 10)
 (2, 10)
 (3, 10)
 (1, 11)
 (2, 11)
 (3, 11)
 (1, 12)
 (2, 12)
 (3, 12)
 (1, 13)
 (2, 13)
 (3, 13)
 (1, 14)
 (2, 14)
 (3, 14)

Notice that the "double for loop" notation uses the second index as the "inner" index -- this would be inefficinet memory access!

In [16]:
for i=1:3, j=10:14
    println((i, j))
end
(1, 10)
(1, 11)
(1, 12)
(1, 13)
(1, 14)
(2, 10)
(2, 11)
(2, 12)
(2, 13)
(2, 14)
(3, 10)
(3, 11)
(3, 12)
(3, 13)
(3, 14)

We can demonstrate the effect of this on performance by iterating over rows or columns in the inner loop

In [66]:
function iterate_rows(m)
    x, y = size(m)
    z = zeros(x, y)
    for i=1:x
        for j=1:y
            z[i, j] = m[i, j]
        end
    end
    z
end
Out[66]:
iterate_rows (generic function with 1 method)
In [67]:
@benchmark iterate_rows(m)
Out[67]:
BenchmarkTools.Trial: 601 samples with 1 evaluation.
 Range (min … max):  6.875 ms … 15.983 ms  ┊ GC (min … max): 0.00% … 31.34%
 Time  (median):     7.569 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.305 ms ±  1.598 ms  ┊ GC (mean ± σ):  7.21% ± 11.83%

    ▇██▄▃▂                                                    
  ▃▆██████▇▅▅▄▃▃▃▃▃▃▃▂▃▃▂▂▃▂▂▃▃▃▃▄▃▃▂▃▃▃▃▃▃▃▃▂▃▃▂▃▃▁▂▂▂▂▂▃▃▂ ▃
  6.88 ms        Histogram: frequency by time        12.9 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.
In [68]:
function iterate_cols(m)
    x, y = size(m)
    z = zeros(x, y)
    for j=1:y
        for i=1:x
            z[i, j] = m[i, j]
        end
    end
    z
end
Out[68]:
iterate_cols (generic function with 1 method)
In [69]:
@benchmark iterate_cols(m)
Out[69]:
BenchmarkTools.Trial: 1838 samples with 1 evaluation.
 Range (min … max):  1.426 ms … 9.704 ms  ┊ GC (min … max):  0.00% … 61.99%
 Time  (median):     2.030 ms             ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.712 ms ± 1.622 ms  ┊ GC (mean ± σ):  24.87% ± 24.52%

   ▅█▄▃▁                                                     
  ▂███████▆▅▄▄▄▄▄▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▄▄▃▃▃▂▂▃▂▂▂▂▃▃▃▃▂▂▂ ▃
  1.43 ms        Histogram: frequency by time       7.36 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

Broadcast Operations¶

Literal arrays are declared as follows

In [17]:
m_0 = [1 0; 1 0]
Out[17]:
2×2 Matrix{Int64}:
 1  0
 1  0

Alternatively:

In [18]:
m_1 = [ [1 0]
        [1 0] ]
Out[18]:
2×2 Matrix{Int64}:
 1  0
 1  0
In [19]:
m_2 = [ [0 1]
        [0 0] ]
Out[19]:
2×2 Matrix{Int64}:
 0  1
 0  0

We can perform basic arithmetic operations on array data as expected:

In [20]:
m_3 = 2 * m_1 + m_2
Out[20]:
2×2 Matrix{Int64}:
 2  1
 2  0

Important there is a difference in the maxtrix multiplication operator (i.e. the *(A,B) method defined for Array) and the element-wise multiplication opertation (i.e. *(A[i], B[i]) for all i). The . syntax is short of "broadcasting" the multiplication operation over the entire array.

In [21]:
m_1 * m_2
Out[21]:
2×2 Matrix{Int64}:
 0  1
 0  1
In [22]:
m_1 .* m_2
Out[22]:
2×2 Matrix{Int64}:
 0  0
 0  0

=> The "dot" operator performs a broadcast. But we can also be more specific:

In [23]:
f(x::Number) = x^2 + 1
Out[23]:
f (generic function with 1 method)
In [24]:
broadcast(f, m_3)
Out[24]:
2×2 Matrix{Int64}:
 5  2
 5  1
In [25]:
f.(m_3)
Out[25]:
2×2 Matrix{Int64}:
 5  2
 5  1

Beware of Abstract Containers¶

Every concrete data type is a first class citizen $\Rightarrow$ there is no penalty to using a data type that is not double, int, etc (provided it is well designed) ... But abstract types are a different:

It is possible to define a container without the data type (it actually is of type Any)

In [26]:
v_a = Vector(undef, 3)
Out[26]:
3-element Vector{Any}:
 #undef
 #undef
 #undef

this can hold any sort of data (which can also change during execution)

In [27]:
fill!(v_a, 1.1)
Out[27]:
3-element Vector{Any}:
 1.1
 1.1
 1.1
In [28]:
v_a[2]="asdf"
v_a
Out[28]:
3-element Vector{Any}:
 1.1
  "asdf"
 1.1

This can result in a performance hit... Any arrays are containers with abstract type $\Rightarrow$ they contain pointers to data instead of data and should be avoided where performance is important

In [32]:
v_abstract = Vector(undef, 10000)
v_concrete = Vector{Float64}(undef, 10000)

fill!(v_abstract, 10)
fill!(v_concrete, 10);
In [33]:
@benchmark sqrt.(v_abstract)
Out[33]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  288.257 μs …   3.851 ms  ┊ GC (min … max): 0.00% … 89.99%
 Time  (median):     370.265 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   396.427 μs ± 221.855 μs  ┊ GC (mean ± σ):  5.03% ±  7.89%

    ▁▄▅▅▄▅▅▆██▇▅▄▃▃▃▂▂▁▁▁ ▁▁                                    ▂
  ▃▆████████████████████████████▇█▇▇▆▇▇▆▇▆▇▇▇▇▅▆▅▅▅▆▄▆▅▄▅▁▄▄▄▅▄ █
  288 μs        Histogram: log(frequency) by time        745 μs <

 Memory estimate: 383.08 KiB, allocs estimate: 19502.
In [34]:
@benchmark sqrt.(v_concrete)
Out[34]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  14.923 μs …   5.462 ms  ┊ GC (min … max):  0.00% … 99.02%
 Time  (median):     45.358 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   54.975 μs ± 210.334 μs  ┊ GC (mean ± σ):  15.75% ±  4.08%

                                 ▁▅█▇▄                          
  ▂▃▃▃▂▂▂▁▂▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂▂▂▂▃▅██████▅▅▅▅▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂ ▃
  14.9 μs         Histogram: frequency by time         67.6 μs <

 Memory estimate: 78.23 KiB, allocs estimate: 5.