How Numpy calls C functions from Python

Since Numpy is implemented in the C language for efficiency reasons, I would like to find out how exactly Numpy calls a C function, like np.array, from Python, as in which part of the Numpy source code is responsible for the calling?

I have tried to follow through the source code, the C implementation of multiarraymodule.c in the numpy/_core/src/multiarray directory and I focused specifically on: PyMethodDef array_module_methods[ ] to get an overview of the functionality the module will provide when imported to python and I also focused on:

PyModuleDef moduledef = {
    PyModuleDef_HEAD_INIT,
    "_multiarray_umath",
    NULL,
    -1,
    array_module_methods,
    NULL,
    NULL,
    NULL,
    NULL
}

which is the module name during its initialisation. I have also observed that: _multiarray_umath is imported into: numpy/_core/multiarray.py, the python module defined functions such as:

@array_function_from_c_func_and_dispatcher(_multiarray_umath.empty_like)
def empty_like(
    prototype, dtype=None, order=None, subok=None, shape=None, *,device=None)

followed by a lot of documentation and examples, then a return of:

return (prototype,)

I thought this is where C functions implementation must be called from, like of empty_like() function?

  • You’d have a much easier time understanding this stuff if you wrote a very simple C library yourself (one that just adds two integers or something) and created the plumbing to get that to work. There are many tutorials out there explaining how to do that. Trying to understand how it works from a mature library without even looking at the full source code is going to be very hard – what exactly are you trying to achieve?

    – 

  • I do not have a problem with using the Python/C API – I have actually walked through an example provided in the documentation of Python, I have actually written simple scripts and fire up gbd, pdb to step them through. But, this time it seems that the simple details get twisted around in numpy. The idea is to try and contribute to an any open source project, preferably scientific since I’m into ML/Data Science field, be it documentation, and I was trying to understand how the Numpy codebase flows first.

    – 

how exactly Numpy calls a C function, like np.array, from Python, as in which part of the Numpy source code is responsible for the calling?

They are called from your code, when you write np.array(). If you try to disassemble a np.array call:

import np
import dis

dis.dis("np.array()")

You get

  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (np)
              4 LOAD_METHOD              1 (array)
             26 PRECALL                  0
             30 CALL                     0
             40 RETURN_VALUE

Just a nice function call, that’s all. Now if we try to disassemble the function itself, if it was Python, we would get its bytecode, as above; but we don’t:

dis.dis(np.array)
# => TypeError: don't know how to disassemble builtin_function_or_method objects

So Python treats np.array just like it treats, say, math.sin: it is some binary code in a library, somewhere, that Python loads. Specifically, on my system it is in

site-packages/numpy/core/_multiarray_umath.cpython-311-darwin.so

(On a Windows system, it would be a .dll file.) This file is the meat of the package numpy.core._multiarray_umath, and it has the function array defined:

from numpy.core._multiarray_umath import array

How does Python know to associate the function array there with the corresponding C code? _multiarray_umath module was defined using PyModule_Create function using the definition here as having certain methods, namely array_module_methods, and among them:

{"array",
    (PyCFunction)array_array,
    METH_FASTCALL | METH_KEYWORDS, NULL},

referencing the array_array function defined here. When the module is imported, the shared library with the appropriate name at the appropriate place in PYTHONPATH is found and dynamically linked, and its functions become accessible to the program. No other Python code is involved in the call itself: numpy.core._multiarray_umath.array() directly calls the binary implementation, just like math.sin(0) does.

Tl;dr: module numpy.core._multiarray_umath is defined with array attribute bound to the C function array_array by a shared library named numpy/core/_multiarray_umath... in PYTHONPATH.

All of this is described in detail in Extending Python with C or C++ and Building C and C++ Extensions.

We can verify that this is indeed the numpy.array we know and love:

import numpy
import numpy.core._multiarray_umath

numpy.core._multiarray_umath.array is numpy.array
# => True

Now, how exactly it ends up being also assigned to numpy.array is a series of convoluted imports, getattrs, globals assignments and whatnot, which I don’t want to try to trace at the moment, but it has nothing to do with how it is called, nor does it have anything to do with specifically C functions.

Leave a Comment