Array-Based Data Types¶

(Back to Overview)

Arrays, and Matrices¶

Physicists like lists and arrays of numbers:

1-D list of numbers:

In [1]:
v = Vector{Float64}(undef, 3)
Out[1]:
3-element Vector{Float64}:
 5.0e-324
 0.0
 2.354168005e-314
In [2]:
fill!(v, 0)
Out[2]:
3-element Vector{Float64}:
 0.0
 0.0
 0.0

Note the convention here: functions that mutate their arguments should end in a ! (this is to warn the user, and "only" a convention ... the compiler won't complain, but the user of your library might).

Arrays can be multi-dimensional:

In [3]:
m = Matrix{Float64}(undef, 2, 2)
Out[3]:
2×2 Matrix{Float64}:
 8.48798e-314  2.122e-314
 8.48798e-314  2.122e-314
In [4]:
fill!(m, 1)
Out[4]:
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0

The Matrix type is a 2D "array", but we can specify any number of dimensions -- 3D for example:

In [5]:
a = Array{Float64, 3}(undef, 2, 5, 2)
Out[5]:
2×5×2 Array{Float64, 3}:
[:, :, 1] =
 2.28465e-314  2.39452e-314  2.39452e-314  2.39452e-314  2.39452e-314
 2.39452e-314  2.39452e-314  2.39452e-314  2.35415e-314  2.39452e-314

[:, :, 2] =
 2.39452e-314  2.38239e-314  2.39452e-314  2.39452e-314  2.38239e-314
 2.39452e-314  2.38239e-314  2.39452e-314  2.39452e-314  2.28465e-314

One-liner to get a matrix o zeros (use ones for ones)

In [7]:
zeros((2,3))
Out[7]:
2×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0

Indices start at 1 (And why shouldn't they?!)¶

In [8]:
v = [1, 2, 3]
Out[8]:
3-element Vector{Int64}:
 1
 2
 3
In [9]:
v[1]
Out[9]:
1

So doing things c-style will get you an error:

In [10]:
v[0]
BoundsError: attempt to access 3-element Vector{Int64} at index [0]

Stacktrace:
 [1] getindex(A::Vector{Int64}, i1::Int64)
   @ Base ./array.jl:801
 [2] top-level scope
   @ In[10]:1
 [3] eval
   @ ./boot.jl:360 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

The end shortcut gives you the last index in an array type

In [11]:
v[end]
Out[11]:
3

Working with ND Arrays¶

The size method returns an order tuple of each directions size

In [12]:
size(a)
Out[12]:
(2, 5, 2)

Loops in brackets [ ... ] get stored as ND arrays

In [13]:
m = [(i, j) for i=1:3, j=10:14]
Out[13]:
3×5 Matrix{Tuple{Int64, Int64}}:
 (1, 10)  (1, 11)  (1, 12)  (1, 13)  (1, 14)
 (2, 10)  (2, 11)  (2, 12)  (2, 13)  (2, 14)
 (3, 10)  (3, 11)  (3, 12)  (3, 13)  (3, 14)

A word of caution: arrays are stored in column-major format in memory. That means that the first indices are contigous blocks in memory (just like Fortran, and unlike C/Python). If you itterate over an array in Julia, make sure that the inner loop itterates over the first index. This is something to keep in mind when itterating over several variables in a loop.

In [14]:
for i in CartesianIndices(m)
    println(i)
end
CartesianIndex(1, 1)
CartesianIndex(2, 1)
CartesianIndex(3, 1)
CartesianIndex(1, 2)
CartesianIndex(2, 2)
CartesianIndex(3, 2)
CartesianIndex(1, 3)
CartesianIndex(2, 3)
CartesianIndex(3, 3)
CartesianIndex(1, 4)
CartesianIndex(2, 4)
CartesianIndex(3, 4)
CartesianIndex(1, 5)
CartesianIndex(2, 5)
CartesianIndex(3, 5)

We can interrogate the memory layout of a multi-dimensional array by using the vec method (a.k.a [:]) to slice it in 1-D:

In [15]:
m[:]
Out[15]:
15-element Vector{Tuple{Int64, Int64}}:
 (1, 10)
 (2, 10)
 (3, 10)
 (1, 11)
 (2, 11)
 (3, 11)
 (1, 12)
 (2, 12)
 (3, 12)
 (1, 13)
 (2, 13)
 (3, 13)
 (1, 14)
 (2, 14)
 (3, 14)

Notice that the "double for loop" notation uses the second index as the "inner" index -- this would be inefficinet memory access!

In [16]:
for i=1:3, j=10:14
    println((i, j))
end
(1, 10)
(1, 11)
(1, 12)
(1, 13)
(1, 14)
(2, 10)
(2, 11)
(2, 12)
(2, 13)
(2, 14)
(3, 10)
(3, 11)
(3, 12)
(3, 13)
(3, 14)

Broadcast Operations¶

Literal arrays are declared as follows

In [17]:
m_0 = [1 0; 1 0]
Out[17]:
2×2 Matrix{Int64}:
 1  0
 1  0

Alternatively:

In [18]:
m_1 = [ [1 0]
        [1 0] ]
Out[18]:
2×2 Matrix{Int64}:
 1  0
 1  0
In [19]:
m_2 = [ [0 1]
        [0 0] ]
Out[19]:
2×2 Matrix{Int64}:
 0  1
 0  0

We can perform basic arithmetic operations on array data as expected:

In [20]:
m_3 = 2 * m_1 + m_2
Out[20]:
2×2 Matrix{Int64}:
 2  1
 2  0

Important there is a difference in the maxtrix multiplication operator (i.e. the *(A,B) method defined for Array) and the element-wise multiplication opertation (i.e. *(A[i], B[i]) for all i). The . syntax is short of "broadcasting" the multiplication operation over the entire array.

In [21]:
m_1 * m_2
Out[21]:
2×2 Matrix{Int64}:
 0  1
 0  1
In [22]:
m_1 .* m_2
Out[22]:
2×2 Matrix{Int64}:
 0  0
 0  0

=> The "dot" operator performs a broadcast. But we can also be more specific:

In [23]:
f(x::Number) = x^2 + 1
Out[23]:
f (generic function with 1 method)
In [24]:
broadcast(f, m_3)
Out[24]:
2×2 Matrix{Int64}:
 5  2
 5  1
In [25]:
f.(m_3)
Out[25]:
2×2 Matrix{Int64}:
 5  2
 5  1

Beware of Abstract Containers¶

Every concrete data type is a first class citizen $\Rightarrow$ there is no penalty to using a data type that is not double, int, etc (provided it is well designed) ... But abstract types are a different:

It is possible to define a container without the data type (it actually is of type Any)

In [26]:
v_a = Vector(undef, 3)
Out[26]:
3-element Vector{Any}:
 #undef
 #undef
 #undef

this can hold any sort of data (which can also change during execution)

In [27]:
fill!(v_a, 1.1)
Out[27]:
3-element Vector{Any}:
 1.1
 1.1
 1.1
In [28]:
v_a[2]="asdf"
v_a
Out[28]:
3-element Vector{Any}:
 1.1
  "asdf"
 1.1

This can result in a performance hit... Any arrays are containers with abstract type $\Rightarrow$ they contain pointers to data instead of data and should be avoided where performance is important

In [35]:
v_abstract = Vector(undef, 10000)
v_concrete = Vector{Float64}(undef, 10000)

fill!(v_abstract, 10)
fill!(v_concrete, 10);
In [34]:
@benchmark sqrt.(v_abstract)
Out[34]:
BenchmarkTools.Trial: 
  memory estimate:  383.12 KiB
  allocs estimate:  19502
  --------------
  minimum time:     336.913 μs (0.00% GC)
  median time:      410.423 μs (0.00% GC)
  mean time:        444.316 μs (3.99% GC)
  maximum time:     4.061 ms (83.78% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [36]:
@benchmark sqrt.(v_concrete)
Out[36]:
BenchmarkTools.Trial: 
  memory estimate:  78.27 KiB
  allocs estimate:  5
  --------------
  minimum time:     15.029 μs (0.00% GC)
  median time:      44.257 μs (0.00% GC)
  mean time:        54.373 μs (16.29% GC)
  maximum time:     6.107 ms (99.11% GC)
  --------------
  samples:          10000
  evals/sample:     1