I tried to understand the concept of `axis` in python libraries `numpy` and `pandas` better, because I often mix them up with similar concepts in R. After trying a few things out and reading around, I think I understand both worlds better now.

During this process, a post on StackOverflow was particularly helpful.

## Axis in Python

Consider the following code snippet

``````import numpy as np
import pandas as pd

ar = np.array([[3,4,5], [4,5,6]])
df = pd.DataFrame({'A':[3,4,5], 'B':[4,5,6]})

## in numpy, if axis is `None`, the mean of the flattened array is reported
ar.mean() # 4.5
## axis=0 means that the operation acts on all *rows* in each column
ar.mean(axis=0) ## array([3.5, 4.5, 5.5])
## axis=1 means that the operation acts on the all *columns* in each row
ar.mean(axis=1) ## array([4., 5.])

## in pandas, if axis is not given, the mean of the columns (axis=0) is reported
## output
##> A    4.0
##> B    5.0
##> dtype: float64
df.mean()
## axis=0 means that the operation acts on all *rows* in each column
## equivalently, one can use `df.mean(axis='rows')` or `df.mean(axis='index')`.
df.mean(axis=0)
## axis=1 means that the operation acts on all *columns* in each row
## output
##> 0    3.5
##> 1    4.5
##> 2    5.5
## dtype: float64
df.mean(axis=1)
``````

In the documentation of `numpy`, it is stated that the axis parameter specifies Axis or axes along which the means are computed. Unfortunately, I find the concept of ‘along which’ particularly confusing.

### The Python behaviour can be better understood with a three-dimensional array

It turns out that the concept of `axis` is easier to understand if we use an example of a three-dimensional array.

``````>>> ar2 = np.array([[[3,4],[5,6]],[[7,8],[9,10]]])
>>> ar2
array([[[ 3,  4],
[ 5,  6]],

[[ 7,  8],
[ 9, 10]]])
>>> ar2.mean()
6.5
>>> ar2.mean(axis=0) # mean of 3 and 7, 4 and 8, 5 and 9, and 6 and 10
array([[5., 6.],
[7., 8.]])
>>> ar2.mean(axis=1) # mean of 3 and 5, 4 and 6, 7 and 9, and 8 and 10
array([[4., 5.],
[8., 9.]])
>>> ar2.mean(axis=2) # mean of 3 and 4, 5 and 6, 7 and 8, and 9 and 10
array([[3.5, 5.5],
[7.5, 9.5]])
``````

In essence, when we run `ar2.mean(axis=0)`, we ask numpy to go through ```ar2[i, 0, 0]``` where `i` can take values between 0 and the first element of `ar2.shape`, and calculate the mean value of the values that numpy sees during the iteration. Next, numpy goes through `ar2[i, 0, 1]` and does the same calculation. Next, it goes through `ar2[i, 1, 0]`. And finally, it goes though `ar2[i, 1, 1]`.

The same logic applies to other values of the parameter `axis`. The only change we shall make then is to change the position of `i`: it will be put in the `axis`th position in the index list used to fetch an element in the n-dimensional array. If you have doubt about that, you can verify the results above with the logic that we have just described. Sure enough, the logic also applies to arrays of higher (or lower) dimensions.

In summary, in `numpy` and `pandas`, the `axis` parameter in `sum` actually specifies `numpy` to calculate the mean of all values that can be fetched in the form of `array[0, 0, ..., i, ..., 0]` where `i` iterates through all possible values. The process is repeated with the position of `i` fixed and the indices of other dimensions vary one after the other (from the most far-right element). The result is a n-1-dimensional array.

## MARGINS in R

My confusion at the beginning may come from similar operations in R with `apply`, where the parameter `MARGIN` is a vector giving the subscripts which the function will be applied over. Compare the results below with the ones above.

``````mymat <- matrix(c(3,4,5,4,5,6), byrow=TRUE, nrow=2)
apply(mymat, 1, mean) ## identical to `rowMeans(myMat)`, reporting c(4, 5)
apply(mymat, 2, mean) ## identical to `colMeans(myMat)`, c(3.5, 4.5, 5.5)
``````

As you see, the behaviour of setting `MARGINS` to `1` and `2` is actually the opposite of that in Python.

### Apply `apply` to a three-dimensional array in R

Let us give it a try.

``````> (d3array <- array(3:10, c(2,2,2)))
, , 1

[,1] [,2]
[1,]    3    5
[2,]    4    6

, , 2

[,1] [,2]
[1,]    7    9
[2,]    8   10
> d3array[1,,,] # this may help us understand the first result better
[,1] [,2]
[1,]    3    7
[2,]    5    9
> mean(d3array[1,,])
6
> apply(d3array, 1, mean)
 6 7
> apply(d3array, 2, mean)
 5.5 7.5
> apply(d3array, 3, mean)
 4.5 8.5
``````

It turns out the logic can be understood easily. `apply(d3array, 1, mean)` will calculate the mean values of `d3array[i,,]` where `i` takes all possible values, and return the results in a vector. Similarly, `apply(d3array, 2, mean)` will calculate the mean values of `d3array[,i,]`, etc.

In summary, in R, the `MARGINS` parameter let the `apply` function calculate the mean of all values that can be fetched in the form of `array[, ... , i, ... ,]` where `i` iterates through all possible values. The process is not repeated when all `i` values have been iterated. The result is therefore a simple vector.

## Conclusions

While I can understand the logic of either convention, I found it is easy to mix the two. I am not sure whether I am the only one who easily mixes up `axis` in Python and `MARGIN` in R. Therefore, I document the differences here, with the hope that at least I can remind myself when I am confused again.

In `panda`, one can use `axis="rows"` or `axis="index"` to calculate mean values of each column, equal to `colMeans` in R. We say that we get `mean along rows` or `mean along index` in Python, and `mean of columns` in R.

Alternately, one uses `axis="columns"` to calculate mean values of each row, equal to `rowMeans` in R. We say that we get `mean along columns` in Python, and `mean of rows` in R.

I thank Iakov Davydov for pointing out the advantage of using `rows` and `columns`.