Since Numpy is implemented in the C language for efficiency reasons, I would like to find out how exactly Numpy calls a C function, like np.array
, from Python, as in which part of the Numpy source code is responsible for the calling?
I have tried to follow through the source code, the C implementation of multiarraymodule.c
in the numpy/_core/src/multiarray
directory and I focused specifically on: PyMethodDef array_module_methods[ ]
to get an overview of the functionality the module will provide when imported to python and I also focused on:
PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT,
"_multiarray_umath",
NULL,
-1,
array_module_methods,
NULL,
NULL,
NULL,
NULL
}
which is the module name during its initialisation. I have also observed that: _multiarray_umath
is imported into: numpy/_core/multiarray.py
, the python module defined functions such as:
@array_function_from_c_func_and_dispatcher(_multiarray_umath.empty_like)
def empty_like(
prototype, dtype=None, order=None, subok=None, shape=None, *,device=None)
followed by a lot of documentation and examples, then a return of:
return (prototype,)
I thought this is where C functions implementation must be called from, like of empty_like()
function?
how exactly Numpy calls a C function, like
np.array
, from Python, as in which part of the Numpy source code is responsible for the calling?
They are called from your code, when you write np.array()
. If you try to disassemble a np.array
call:
import np
import dis
dis.dis("np.array()")
You get
0 0 RESUME 0
1 2 LOAD_NAME 0 (np)
4 LOAD_METHOD 1 (array)
26 PRECALL 0
30 CALL 0
40 RETURN_VALUE
Just a nice function call, that’s all. Now if we try to disassemble the function itself, if it was Python, we would get its bytecode, as above; but we don’t:
dis.dis(np.array)
# => TypeError: don't know how to disassemble builtin_function_or_method objects
So Python treats np.array
just like it treats, say, math.sin
: it is some binary code in a library, somewhere, that Python loads. Specifically, on my system it is in
site-packages/numpy/core/_multiarray_umath.cpython-311-darwin.so
(On a Windows system, it would be a .dll
file.) This file is the meat of the package numpy.core._multiarray_umath
, and it has the function array
defined:
from numpy.core._multiarray_umath import array
How does Python know to associate the function array
there with the corresponding C code? _multiarray_umath
module was defined using PyModule_Create
function using the definition here
as having certain methods, namely array_module_methods
, and among them:
{"array",
(PyCFunction)array_array,
METH_FASTCALL | METH_KEYWORDS, NULL},
referencing the array_array
function defined here. When the module is imported, the shared library with the appropriate name at the appropriate place in PYTHONPATH
is found and dynamically linked, and its functions become accessible to the program. No other Python code is involved in the call itself: numpy.core._multiarray_umath.array()
directly calls the binary implementation, just like math.sin(0)
does.
Tl;dr: module numpy.core._multiarray_umath
is defined with array
attribute bound to the C function array_array
by a shared library named numpy/core/_multiarray_umath...
in PYTHONPATH
.
All of this is described in detail in Extending Python with C or C++ and Building C and C++ Extensions.
We can verify that this is indeed the numpy.array
we know and love:
import numpy
import numpy.core._multiarray_umath
numpy.core._multiarray_umath.array is numpy.array
# => True
Now, how exactly it ends up being also assigned to numpy.array
is a series of convoluted imports, getattrs, globals assignments and whatnot, which I don’t want to try to trace at the moment, but it has nothing to do with how it is called, nor does it have anything to do with specifically C functions.
You’d have a much easier time understanding this stuff if you wrote a very simple C library yourself (one that just adds two integers or something) and created the plumbing to get that to work. There are many tutorials out there explaining how to do that. Trying to understand how it works from a mature library without even looking at the full source code is going to be very hard – what exactly are you trying to achieve?
I do not have a problem with using the Python/C API – I have actually walked through an example provided in the documentation of Python, I have actually written simple scripts and fire up gbd, pdb to step them through. But, this time it seems that the simple details get twisted around in numpy. The idea is to try and contribute to an any open source project, preferably scientific since I’m into ML/Data Science field, be it documentation, and I was trying to understand how the Numpy codebase flows first.