Julia can call any function in a shared C library. This requires the library to be built using the -shared and -fPIC modes. The library has to either be in the current working directory, or in the system PATH (or the dynamic linker's path, depending on your OS).
For example, the standard library libc provides the clock function. Note: the colon needs to be provided in :clock to prevent julia from passing the clock function pointer (and subsequently failing in no such function is defined). Thus the colon is a symbol:
typeof(:clock)
Symbol
We call the clock function using the ccall command:
ccall((:clock, "libc"), Int32, ())
12204446
Which calls the clock function in libc, expecting an Int32 return type, and passing 0 arguments (hence the blank tuple as the last argument). We could wrap this into a function:?
f() = ccall((:clock, "libc"), Int32, ())
f (generic function with 1 method)
f()
12896611
Whose llvm intermediate representation, and the assembly code, shows us what's going on:
@code_llvm(f())
; @ In[3]:1 within `f` define i32 @julia_f_1439() #0 { top: %0 = call i32 inttoptr (i64 140703431076467 to i32 ()*)() ret i32 %0 }
@code_native(f())
.section __TEXT,__text,regular,pure_instructions .build_version macos, 12, 0 .globl _julia_f_1481 ## -- Begin function julia_f_1481 .p2align 4, 0x90 _julia_f_1481: ## @julia_f_1481 ; ┌ @ In[3]:1 within `f` .cfi_startproc ## %bb.0: ## %top subq $8, %rsp .cfi_def_cfa_offset 16 movabsq $140703431076467, %rax ## imm = 0x7FF812072A73 callq *%rax popq %rcx retq .cfi_endproc ; └ ## -- End function .subsections_via_symbols
Here we see that we are calling a function starting at the position $clock (the linker handles all of this) and returning the result
More complex data types, such as strings, might not exist in Julia. In the case of strings, julia provieds a Cstring data type, which is binary compatible with c. Here we also see what happens when we want to pass a single input argument: the call signature tuple (second-to-last argument) needs to match the function being called, and the inputs are appended as varargs:
ret = ccall((:getenv, "libc"), Cstring, (Cstring,), "SHELL")
Cstring(0x00007ff7bafc14b9)
Which can be coverted back into a Julia string using unsafe_string.
unsafe_string(ret)
"/usr/local/bin/fish"
Note that if the function returns NULL (which is matched to C_NULL in julia-land):
ret = ccall((:getenv, "libc"), Cstring, (Cstring,), "asdf")
Cstring(0x0000000000000000)
... then unsafe_string throws an error:
unsafe_string(ret)
ArgumentError: cannot convert NULL to string Stacktrace: [1] unsafe_string @ ./strings/string.jl:72 [inlined] [2] unsafe_string(s::Cstring) @ Base ./c.jl:193 [3] top-level scope @ In[10]:1 [4] eval @ ./boot.jl:368 [inlined] [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1428
Which means that we need to sanitize our ccall whenever this can happen:
function getenv(var::AbstractString)
val = ccall((:getenv, "libc"),
Cstring, (Cstring,), var)
if val == C_NULL
error("getenv: undefined variable: ", var)
end
unsafe_string(val)
end
getenv (generic function with 1 method)
getenv("SHELL")
"/usr/local/bin/fish"
getenv("asdf")
getenv: undefined variable: asdf Stacktrace: [1] error(::String, ::String) @ Base ./error.jl:44 [2] getenv(var::String) @ Main ./In[11]:5 [3] top-level scope @ In[13]:1 [4] eval @ ./boot.jl:368 [inlined] [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1428
Let's look at the "Hello World" of shared library: adding 1 to an integer. This is accomplished by the asdf_int function in asdf.c. After compiling in place with gcc -shared -fPIC -o asdf_lib.dylib asdf.c we can call it:
ccall((:asdf_int, "asdf_lib"), Int32, (Int32,), -1)
0
Note that in the example above, I had to use the .dylib extension because I'm using a mac: https://stackoverflow.com/questions/2339679/what-are-the-differences-between-so-and-dylib-on-osx
But this example is rather boring. What about sending arrays? Consider the sum function sum(a), which computes
$$
\mathrm{sum}(a) = \sum_{i=1}^n a_i,
$$
where $n$ is the length of a.
a = rand(10^7) # 1D vector of random numbers, uniform on [0,1)
10000000-element Vector{Float64}:
0.8613616678836064
0.6788806822948108
0.47735441120249267
0.9872453487938009
0.0939569660165096
0.9975542610320133
0.9359560690972771
0.13179595010979905
0.3870351677953733
0.30137106295064764
0.0827855699267499
0.05531853697162825
0.9348339001953763
⋮
0.7703630608529817
0.785851212129005
0.9956362623582224
0.4591075308839966
0.8398996934654701
0.9995945836053647
0.12824102300712326
0.9404965731289516
0.6715917272434537
0.0025913133378624442
0.5697798774608598
0.9956946163870105
sum(a)
4.998520918420762e6
Let's do this in C (I'm a bit lazy ... so I'll define the C code here rather than in its own file...). Note that the julia function tempname returns the path to a temporary file (depending on your OS). We can pipe our C code into gcc and write the output into our temporary library, that's what
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
print(f, C_code)
end
does. Note also that we're optimising the code, and we're asking gcc for the dynamic library extension using Libdl.dlext. As a final note: in Julia, string concatenation is done with the times *(::AbstractString,::AbstractString) function.
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
double s = 0.0;
for (size_t i = 0; i < n; ++i) {
s += X[i];
}
return s;
}
"""
const Clib = tempname() # make a temporary file
# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
print(f, C_code)
end
# define a Julia function that calls the C function:
c_sum(X::Array{Float64}) = ccall(
("c_sum", Clib), Float64,
(Csize_t, Ptr{Float64}), length(X), X
)
c_sum (generic function with 1 method)
Note above the inline function c_sum which acts as a wrapper for ccall:
ccall(
("c_sum", Clib), Float64,
(Csize_t, Ptr{Float64}), length(X), X
)
here we do several things:
c_sum function is defined in the temporary library with the path stored in Cliblength returns an integer for the 1D size of the array at X, this integer is mapped to the C size type size_t (which is labled as Csize_t in Julia).Ptr{Float64}.@code_llvm(c_sum(a))
; @ In[18]:23 within `c_sum` define double @julia_c_sum_2217({}* nonnull align 16 dereferenceable(40) %0) #0 { top: ; ┌ @ array.jl:215 within `length` %1 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }* %2 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %1, i64 0, i32 1 %3 = load i64, i64* %2, align 8 ; └ ; ┌ @ pointer.jl:65 within `unsafe_convert` %4 = bitcast {}* %0 to i8** %5 = load i8*, i8** %4, align 8 %6 = ptrtoint i8* %5 to i64 ; └ %7 = call double inttoptr (i64 4382146336 to double (i64, i64)*)(i64 %3, i64 %6) ret double %7 }
@code_native(c_sum(a))
.section __TEXT,__text,regular,pure_instructions .build_version macos, 12, 0 .globl _julia_c_sum_2220 ## -- Begin function julia_c_sum_2220 .p2align 4, 0x90 _julia_c_sum_2220: ## @julia_c_sum_2220 ; ┌ @ In[18]:23 within `c_sum` .cfi_startproc ## %bb.0: ## %top subq $8, %rsp .cfi_def_cfa_offset 16 ; │┌ @ pointer.jl:65 within `unsafe_convert` movq (%rdi), %rsi ; │└ ; │┌ @ array.jl:215 within `length` movq 8(%rdi), %rdi movabsq $4382146336, %rax ## imm = 0x105323F20 ; │└ callq *%rax popq %rax retq .cfi_endproc ; └ ## -- End function .subsections_via_symbols
Let's compare the result with Julia's internal sum function:
c_sum(a) ≈ sum(a) # type \approx and then <TAB> to get the ≈ symbolb
true
Let's look at some benchmarks comparing c_sum with Julia's inbuilt sum:
c_bench = @benchmark c_sum(a)
BenchmarkTools.Trial: 312 samples with 1 evaluation. Range (min … max): 15.628 ms … 19.708 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 15.808 ms ┊ GC (median): 0.00% Time (mean ± σ): 16.053 ms ± 649.531 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▃▇█▆▅▄▃ ▂ ███████▇█▇▆█▁▇▄▆▆▄▄▆█▆▄▄▆▁▇▄▁▄▁▆▄▄▄▄▁▄▁▁▁▄▄▄▁▄▄▄▁▁▄▄▁▁▄▁▁▄▄▄ ▇ 15.6 ms Histogram: log(frequency) by time 18.7 ms < Memory estimate: 48 bytes, allocs estimate: 1.
j_bench = @benchmark sum(a)
BenchmarkTools.Trial: 974 samples with 1 evaluation. Range (min … max): 4.411 ms … 10.278 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 4.885 ms ┊ GC (median): 0.00% Time (mean ± σ): 5.122 ms ± 748.226 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▅█▅▆▂▅▂▄▁▁ ▆██████████▇█▅▆▆▅▄▄▄▅▃▄▄▃▃▃▃▃▃▃▃▂▃▃▂▂▃▃▂▁▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂ ▄ 4.41 ms Histogram: frequency by time 7.81 ms < Memory estimate: 16 bytes, allocs estimate: 1.
Oh boy... it seems that c is slower than Julia?! Well not quite, as we'll see later. The reason for why the c_sum function is a bit slower than Julia's internal sum is most likely because c_sum relies on dynamically linked code. When a compiler optimizes, it needs to know specifics about how the code is implemented. Furthermore, the ccall function has a little overhead which due to type ineroperability.
This is not a Julia specific overhead... Any dynamically linked library call would experience this overhead (so don't go blaming Julia). In the future, inlining will be supported:
https://docs.julialang.org/en/stable/manual/calling-c-and-fortran-code/index.html
Let's do the sum_f function which evalues a Julia function $f$ as it goes:
$$
\mathrm{sum\_f}(a) = \sum_{i=1}^n f(a_i),
$$
where $n$ is the length of a and $f$ is a function defined in Julia
In order to pass a function pointer correctly, we need to make sure that the Julia function takes and returns C-compatible data types. This is shown in the following example:
function j_sqrt(x::Cdouble)
convert(Cdouble, sqrt(x))::Cdouble
end
j_sqrt (generic function with 1 method)
... which works just like a real square root:
j_sqrt(2.) ≈ √2.
true
... but it also does a little work converting the output of Julia's inbuilt sqrt function to a Cdouble. In order to pass a Julia function to C, it needs to be converted into a function pointer. This has as Julia data type of Ptr{Void}, but don't worry, when we define our C function, we declare the output of the function pointer to have the right type.
const j_sqrt_c = @cfunction(j_sqrt, Cdouble, (Cdouble,))
Ptr{Nothing} @0x000000010793d360
Now we can pass a function pointer to our C function by modifying our c_sum function call signature:
double c_sqrt_sum(double (* j_sqrt)(double), size_t n, double *X)
where the first argument is a function pointer to a function with call signature double j_sqrt(double). Note that a function on the "other end" of a function pointer is called using the usual function j_sqrt(x) syntax.
C_sqrt_code = """
#include <stddef.h>
double c_sqrt_sum(double (* j_sqrt)(double), size_t n, double *X) {
double s = 0.0;
for (size_t i = 0; i < n; ++i) {
s += j_sqrt(X[i]);
}
return s;
}
"""
const Clib_sqrt = tempname() # make a temporary file
# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):
open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib_sqrt * "." * Libdl.dlext) -`, "w") do f
print(f, C_sqrt_code)
end
# define a Julia function that calls the C function:
c_sqrt_sum(X::Array{Float64}) = ccall(
("c_sqrt_sum", Clib_sqrt),
Float64,
(Ptr{Nothing}, Csize_t, Ptr{Float64}),
j_sqrt_c, length(X), X
)
c_sqrt_sum (generic function with 1 method)
Note above that we also modified the call signature to include the function pointer (as well as passing the Julia function pointer j_sqrt_c).
Let's compare the performance of our C function with a Julia-only implenentation:
j_sqrt_sum(X::Array{Float64}) = sum( broadcast( sqrt, X ) )
j_sqrt_sum (generic function with 1 method)
c_sqrt_sum(a) ≈ j_sqrt_sum(a)
true
@benchmark(c_sqrt_sum($a))
BenchmarkTools.Trial: 64 samples with 1 evaluation. Range (min … max): 72.559 ms … 107.022 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 77.179 ms ┊ GC (median): 0.00% Time (mean ± σ): 78.712 ms ± 6.748 ms ┊ GC (mean ± σ): 0.00% ± 0.00% █ █ ▃▃ ▁▃▃ ▇█▆█▄▆▆██▄███▁▁▄▆▁▄▆▁▁▁▁▄▁▄▁▁▄▁▁▄▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁ 72.6 ms Histogram: frequency by time 106 ms < Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark(j_sqrt_sum($a))
BenchmarkTools.Trial: 117 samples with 1 evaluation. Range (min … max): 34.171 ms … 56.576 ms ┊ GC (min … max): 0.00% … 26.07% Time (median): 42.539 ms ┊ GC (median): 0.48% Time (mean ± σ): 42.940 ms ± 6.606 ms ┊ GC (mean ± σ): 13.98% ± 12.45% ▂▆ ▂▆▄█▂ ▂ ▂ ▂▂▄ ▂ ████████▆██▄▄▆▁▁▁▄▄▄▁▁▄█▁▁▁▁▁▄▄▆▆█▄███▆▆▆█▄█▆▁█▆▆▆▆▁▆▄▁▆▁▁▄ ▄ 34.2 ms Histogram: frequency by time 55.3 ms < Memory estimate: 76.29 MiB, allocs estimate: 2.
Hrm... so it seems that we're paying a price for constantly calling a function pointer, rather than a pure c function. Let's verify our theory by implementing a purely C form of sum_f (i.e. we're not going to be calling the Julia sqrt function, and instead we'll use the one defined in math.h).
Cc_sqrt_code = """
#include <stddef.h>
#include <math.h>
double cc_sqrt_sum(size_t n, double *X) {
double s = 0.0;
for (size_t i = 0; i < n; ++i) {
s += sqrt(X[i]);
}
return s;
}
"""
const Clib_c_sqrt = tempname() # make a temporary file
# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):
open(`gcc -fPIC -O3 -msse3 -xc -shared -lm -o $(Clib_c_sqrt * "." * Libdl.dlext) -`, "w") do f
print(f, Cc_sqrt_code)
end
# define a Julia function that calls the C function:
cc_sqrt_sum(X::Array{Float64}) = ccall(
("cc_sqrt_sum", Clib_c_sqrt),
Float64,
(Csize_t, Ptr{Float64}),
length(X), X
)
cc_sqrt_sum (generic function with 1 method)
@benchmark(c_sqrt_sum($a))
BenchmarkTools.Trial: 67 samples with 1 evaluation. Range (min … max): 72.338 ms … 104.730 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 73.986 ms ┊ GC (median): 0.00% Time (mean ± σ): 75.805 ms ± 5.319 ms ┊ GC (mean ± σ): 0.00% ± 0.00% █▁ ██▆▅▇▅▇▄▅▇▄▁▁▁▃▁▁▃▁▃▄▁▃▃▃▁▄▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁ 72.3 ms Histogram: frequency by time 95 ms < Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark(cc_sqrt_sum($a))
BenchmarkTools.Trial: 202 samples with 1 evaluation. Range (min … max): 24.082 ms … 28.802 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 24.452 ms ┊ GC (median): 0.00% Time (mean ± σ): 24.770 ms ± 833.654 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▃▄▄█▄▂ ██████▇▇▄▅▃▄▄▃▅▃▃▁▃▅▄▃▁▃▄▃▃▃▃▄▁▃▃▃▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▃ ▃ 24.1 ms Histogram: frequency by time 28.6 ms < Memory estimate: 0 bytes, allocs estimate: 0.
Ha! Now we're getting a good speed up. This shows that the performance pentalty for Julia functions isn't bad (and until we have inlining of C functions, the might not always be a speedup by going to C). Yet in some cases, simple compute kernels written entirely in C might improve performance.