Integrating with external C libraries¶

(Back to Overview)

Julia can call any function in a shared C library. This requires the library to be built using the -shared and -fPIC modes. The library has to either be in the current working directory, or in the system PATH (or the dynamic linker's path, depending on your OS).

For example, the standard library libc provides the clock function. Note: the colon needs to be provided in :clock to prevent julia from passing the clock function pointer (and subsequently failing in no such function is defined). Thus the colon is a symbol:

In [1]:
typeof(:clock)
Out[1]:
Symbol

We call the clock function using the ccall command:

In [2]:
ccall((:clock, "libc"), Int32, ())
Out[2]:
12204446

Which calls the clock function in libc, expecting an Int32 return type, and passing 0 arguments (hence the blank tuple as the last argument). We could wrap this into a function:?

In [3]:
f() = ccall((:clock, "libc"), Int32, ())
Out[3]:
f (generic function with 1 method)
In [4]:
f()
Out[4]:
12896611

Whose llvm intermediate representation, and the assembly code, shows us what's going on:

In [5]:
@code_llvm(f())
;  @ In[3]:1 within `f`
define i32 @julia_f_1439() #0 {
top:
  %0 = call i32 inttoptr (i64 140703431076467 to i32 ()*)()
  ret i32 %0
}
In [6]:
@code_native(f())
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 12, 0
	.globl	_julia_f_1481                   ## -- Begin function julia_f_1481
	.p2align	4, 0x90
_julia_f_1481:                          ## @julia_f_1481
; ┌ @ In[3]:1 within `f`
	.cfi_startproc
## %bb.0:                               ## %top
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movabsq	$140703431076467, %rax          ## imm = 0x7FF812072A73
	callq	*%rax
	popq	%rcx
	retq
	.cfi_endproc
; └
                                        ## -- End function
.subsections_via_symbols

Here we see that we are calling a function starting at the position $clock (the linker handles all of this) and returning the result

C-compatible strings¶

More complex data types, such as strings, might not exist in Julia. In the case of strings, julia provieds a Cstring data type, which is binary compatible with c. Here we also see what happens when we want to pass a single input argument: the call signature tuple (second-to-last argument) needs to match the function being called, and the inputs are appended as varargs:

In [7]:
ret = ccall((:getenv, "libc"), Cstring, (Cstring,), "SHELL")
Out[7]:
Cstring(0x00007ff7bafc14b9)

Which can be coverted back into a Julia string using unsafe_string.

In [8]:
unsafe_string(ret)
Out[8]:
"/usr/local/bin/fish"

Note that if the function returns NULL (which is matched to C_NULL in julia-land):

In [9]:
ret = ccall((:getenv, "libc"), Cstring, (Cstring,), "asdf")
Out[9]:
Cstring(0x0000000000000000)

... then unsafe_string throws an error:

In [10]:
unsafe_string(ret)
ArgumentError: cannot convert NULL to string

Stacktrace:
 [1] unsafe_string
   @ ./strings/string.jl:72 [inlined]
 [2] unsafe_string(s::Cstring)
   @ Base ./c.jl:193
 [3] top-level scope
   @ In[10]:1
 [4] eval
   @ ./boot.jl:368 [inlined]
 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1428

Which means that we need to sanitize our ccall whenever this can happen:

In [11]:
function getenv(var::AbstractString)
    val = ccall((:getenv, "libc"),
                Cstring, (Cstring,), var)
    if val == C_NULL
        error("getenv: undefined variable: ", var)
    end
    unsafe_string(val)
end
Out[11]:
getenv (generic function with 1 method)
In [12]:
getenv("SHELL")
Out[12]:
"/usr/local/bin/fish"
In [13]:
getenv("asdf")
getenv: undefined variable: asdf

Stacktrace:
 [1] error(::String, ::String)
   @ Base ./error.jl:44
 [2] getenv(var::String)
   @ Main ./In[11]:5
 [3] top-level scope
   @ In[13]:1
 [4] eval
   @ ./boot.jl:368 [inlined]
 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1428

Writing and calling your own libraries:¶

Let's look at the "Hello World" of shared library: adding 1 to an integer. This is accomplished by the asdf_int function in asdf.c. After compiling in place with gcc -shared -fPIC -o asdf_lib.dylib asdf.c we can call it:

In [14]:
ccall((:asdf_int, "asdf_lib"), Int32, (Int32,), -1)
Out[14]:
0

Note that in the example above, I had to use the .dylib extension because I'm using a mac: https://stackoverflow.com/questions/2339679/what-are-the-differences-between-so-and-dylib-on-osx

But this example is rather boring. What about sending arrays? Consider the sum function sum(a), which computes $$ \mathrm{sum}(a) = \sum_{i=1}^n a_i, $$ where $n$ is the length of a.

In [15]:
a = rand(10^7) # 1D vector of random numbers, uniform on [0,1)
Out[15]:
10000000-element Vector{Float64}:
 0.8613616678836064
 0.6788806822948108
 0.47735441120249267
 0.9872453487938009
 0.0939569660165096
 0.9975542610320133
 0.9359560690972771
 0.13179595010979905
 0.3870351677953733
 0.30137106295064764
 0.0827855699267499
 0.05531853697162825
 0.9348339001953763
 ⋮
 0.7703630608529817
 0.785851212129005
 0.9956362623582224
 0.4591075308839966
 0.8398996934654701
 0.9995945836053647
 0.12824102300712326
 0.9404965731289516
 0.6715917272434537
 0.0025913133378624442
 0.5697798774608598
 0.9956946163870105
In [16]:
sum(a)
Out[16]:
4.998520918420762e6

Let's do this in C (I'm a bit lazy ... so I'll define the C code here rather than in its own file...). Note that the julia function tempname returns the path to a temporary file (depending on your OS). We can pipe our C code into gcc and write the output into our temporary library, that's what

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

does. Note also that we're optimising the code, and we're asking gcc for the dynamic library extension using Libdl.dlext. As a final note: in Julia, string concatenation is done with the times *(::AbstractString,::AbstractString) function.

In [18]:
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum(X::Array{Float64}) = ccall(
    ("c_sum", Clib), Float64, 
    (Csize_t, Ptr{Float64}), length(X), X
)
Out[18]:
c_sum (generic function with 1 method)

Note above the inline function c_sum which acts as a wrapper for ccall:

ccall(
    ("c_sum", Clib), Float64, 
    (Csize_t, Ptr{Float64}), length(X), X
)

here we do several things:

  1. the c_sum function is defined in the temporary library with the path stored in Clib
  2. note that length returns an integer for the 1D size of the array at X, this integer is mapped to the C size type size_t (which is labled as Csize_t in Julia).
  3. The array is passed as a float pointer using the pointer type Ptr{Float64}.
In [20]:
@code_llvm(c_sum(a))
;  @ In[18]:23 within `c_sum`
define double @julia_c_sum_2217({}* nonnull align 16 dereferenceable(40) %0) #0 {
top:
; ┌ @ array.jl:215 within `length`
   %1 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }*
   %2 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %1, i64 0, i32 1
   %3 = load i64, i64* %2, align 8
; └
; ┌ @ pointer.jl:65 within `unsafe_convert`
   %4 = bitcast {}* %0 to i8**
   %5 = load i8*, i8** %4, align 8
   %6 = ptrtoint i8* %5 to i64
; └
  %7 = call double inttoptr (i64 4382146336 to double (i64, i64)*)(i64 %3, i64 %6)
  ret double %7
}
In [21]:
@code_native(c_sum(a))
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 12, 0
	.globl	_julia_c_sum_2220               ## -- Begin function julia_c_sum_2220
	.p2align	4, 0x90
_julia_c_sum_2220:                      ## @julia_c_sum_2220
; ┌ @ In[18]:23 within `c_sum`
	.cfi_startproc
## %bb.0:                               ## %top
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
; │┌ @ pointer.jl:65 within `unsafe_convert`
	movq	(%rdi), %rsi
; │└
; │┌ @ array.jl:215 within `length`
	movq	8(%rdi), %rdi
	movabsq	$4382146336, %rax               ## imm = 0x105323F20
; │└
	callq	*%rax
	popq	%rax
	retq
	.cfi_endproc
; └
                                        ## -- End function
.subsections_via_symbols

Let's compare the result with Julia's internal sum function:

In [22]:
c_sum(a) ≈ sum(a)  # type \approx and then <TAB> to get the ≈ symbolb
Out[22]:
true

Benchmarking your C libaries¶

Let's look at some benchmarks comparing c_sum with Julia's inbuilt sum:

In [27]:
c_bench = @benchmark c_sum(a)
Out[27]:
BenchmarkTools.Trial: 312 samples with 1 evaluation.
 Range (min … max):  15.628 ms …  19.708 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     15.808 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.053 ms ± 649.531 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▇█▆▅▄▃ ▂                                                     
  ███████▇█▇▆█▁▇▄▆▆▄▄▆█▆▄▄▆▁▇▄▁▄▁▆▄▄▄▄▁▄▁▁▁▄▄▄▁▄▄▄▁▁▄▄▁▁▄▁▁▄▄▄ ▇
  15.6 ms       Histogram: log(frequency) by time      18.7 ms <

 Memory estimate: 48 bytes, allocs estimate: 1.
In [28]:
j_bench = @benchmark sum(a)
Out[28]:
BenchmarkTools.Trial: 974 samples with 1 evaluation.
 Range (min … max):  4.411 ms …  10.278 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.885 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.122 ms ± 748.226 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅█▅▆▂▅▂▄▁▁                                                  
  ▆██████████▇█▅▆▆▅▄▄▄▅▃▄▄▃▃▃▃▃▃▃▃▂▃▃▂▂▃▃▂▁▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂ ▄
  4.41 ms         Histogram: frequency by time        7.81 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

Oh boy... it seems that c is slower than Julia?! Well not quite, as we'll see later. The reason for why the c_sum function is a bit slower than Julia's internal sum is most likely because c_sum relies on dynamically linked code. When a compiler optimizes, it needs to know specifics about how the code is implemented. Furthermore, the ccall function has a little overhead which due to type ineroperability.

This is not a Julia specific overhead... Any dynamically linked library call would experience this overhead (so don't go blaming Julia). In the future, inlining will be supported:

https://docs.julialang.org/en/stable/manual/calling-c-and-fortran-code/index.html

Knitting code closer together¶

Let's do the sum_f function which evalues a Julia function $f$ as it goes: $$ \mathrm{sum\_f}(a) = \sum_{i=1}^n f(a_i), $$ where $n$ is the length of a and $f$ is a function defined in Julia

In order to pass a function pointer correctly, we need to make sure that the Julia function takes and returns C-compatible data types. This is shown in the following example:

In [29]:
function j_sqrt(x::Cdouble)
    convert(Cdouble, sqrt(x))::Cdouble
end
Out[29]:
j_sqrt (generic function with 1 method)

... which works just like a real square root:

In [30]:
j_sqrt(2.) ≈ √2.
Out[30]:
true

... but it also does a little work converting the output of Julia's inbuilt sqrt function to a Cdouble. In order to pass a Julia function to C, it needs to be converted into a function pointer. This has as Julia data type of Ptr{Void}, but don't worry, when we define our C function, we declare the output of the function pointer to have the right type.

In [32]:
const j_sqrt_c = @cfunction(j_sqrt, Cdouble, (Cdouble,))
Out[32]:
Ptr{Nothing} @0x000000010793d360

Now we can pass a function pointer to our C function by modifying our c_sum function call signature:

double c_sqrt_sum(double (* j_sqrt)(double), size_t n, double *X)

where the first argument is a function pointer to a function with call signature double j_sqrt(double). Note that a function on the "other end" of a function pointer is called using the usual function j_sqrt(x) syntax.

In [33]:
C_sqrt_code = """
#include <stddef.h>

double c_sqrt_sum(double (* j_sqrt)(double), size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += j_sqrt(X[i]);
    }
    return s;
}
"""

const Clib_sqrt = tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib_sqrt * "." * Libdl.dlext) -`, "w") do f
    print(f, C_sqrt_code) 
end

# define a Julia function that calls the C function:
c_sqrt_sum(X::Array{Float64}) = ccall(
    ("c_sqrt_sum", Clib_sqrt), 
    Float64, 
    (Ptr{Nothing}, Csize_t, Ptr{Float64}), 
    j_sqrt_c, length(X), X
)
Out[33]:
c_sqrt_sum (generic function with 1 method)

Note above that we also modified the call signature to include the function pointer (as well as passing the Julia function pointer j_sqrt_c).

Let's compare the performance of our C function with a Julia-only implenentation:

In [35]:
j_sqrt_sum(X::Array{Float64}) = sum( broadcast( sqrt, X ) )
Out[35]:
j_sqrt_sum (generic function with 1 method)
In [36]:
c_sqrt_sum(a) ≈ j_sqrt_sum(a)
Out[36]:
true
In [37]:
@benchmark(c_sqrt_sum($a))
Out[37]:
BenchmarkTools.Trial: 64 samples with 1 evaluation.
 Range (min … max):  72.559 ms … 107.022 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     77.179 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   78.712 ms ±   6.748 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █ █   ▃▃ ▁▃▃                                                 
  ▇█▆█▄▆▆██▄███▁▁▄▆▁▄▆▁▁▁▁▄▁▄▁▁▄▁▁▄▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  72.6 ms         Histogram: frequency by time          106 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
In [38]:
@benchmark(j_sqrt_sum($a))
Out[38]:
BenchmarkTools.Trial: 117 samples with 1 evaluation.
 Range (min … max):  34.171 ms … 56.576 ms  ┊ GC (min … max):  0.00% … 26.07%
 Time  (median):     42.539 ms              ┊ GC (median):     0.48%
 Time  (mean ± σ):   42.940 ms ±  6.606 ms  ┊ GC (mean ± σ):  13.98% ± 12.45%

  ▂▆ ▂▆▄█▂ ▂                       ▂ ▂▂▄   ▂                   
  ████████▆██▄▄▆▁▁▁▄▄▄▁▁▄█▁▁▁▁▁▄▄▆▆█▄███▆▆▆█▄█▆▁█▆▆▆▆▁▆▄▁▆▁▁▄ ▄
  34.2 ms         Histogram: frequency by time        55.3 ms <

 Memory estimate: 76.29 MiB, allocs estimate: 2.

Hrm... so it seems that we're paying a price for constantly calling a function pointer, rather than a pure c function. Let's verify our theory by implementing a purely C form of sum_f (i.e. we're not going to be calling the Julia sqrt function, and instead we'll use the one defined in math.h).

In [39]:
Cc_sqrt_code = """
#include <stddef.h>
#include <math.h>

double cc_sqrt_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += sqrt(X[i]);
    }
    return s;
}
"""

const Clib_c_sqrt = tempname()   # make a temporary file


# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -fPIC -O3 -msse3 -xc -shared -lm -o $(Clib_c_sqrt * "." * Libdl.dlext) -`, "w") do f
    print(f, Cc_sqrt_code) 
end

# define a Julia function that calls the C function:
cc_sqrt_sum(X::Array{Float64}) = ccall(
    ("cc_sqrt_sum", Clib_c_sqrt), 
    Float64, 
    (Csize_t, Ptr{Float64}), 
    length(X), X
)
Out[39]:
cc_sqrt_sum (generic function with 1 method)
In [42]:
@benchmark(c_sqrt_sum($a))
Out[42]:
BenchmarkTools.Trial: 67 samples with 1 evaluation.
 Range (min … max):  72.338 ms … 104.730 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     73.986 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   75.805 ms ±   5.319 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁                                                            
  ██▆▅▇▅▇▄▅▇▄▁▁▁▃▁▁▃▁▃▄▁▃▃▃▁▄▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  72.3 ms         Histogram: frequency by time           95 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
In [43]:
@benchmark(cc_sqrt_sum($a))
Out[43]:
BenchmarkTools.Trial: 202 samples with 1 evaluation.
 Range (min … max):  24.082 ms …  28.802 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.452 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.770 ms ± 833.654 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▄▄█▄▂                                                        
  ██████▇▇▄▅▃▄▄▃▅▃▃▁▃▅▄▃▁▃▄▃▃▃▃▄▁▃▃▃▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▃ ▃
  24.1 ms         Histogram: frequency by time         28.6 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Ha! Now we're getting a good speed up. This shows that the performance pentalty for Julia functions isn't bad (and until we have inlining of C functions, the might not always be a speedup by going to C). Yet in some cases, simple compute kernels written entirely in C might improve performance.