# š¶ idxhoundĀ¶

`numpy`

provides outstanding indexing through its advanced indexing capabilities 1. `idxhound`

tracks indices across one or more selections to make sure you always know where your data (in the form of array elements) came from.

Alternatives include `pandas.Index`

and `xarray.DataArray`

which allow for indices other than monotonic integers. But sometimes one just wants to deal with the raw data arrays, e.g. to avoid any impact on performance or integrate with third-party libraries that expect raw numpy arrays. Thatās where `idxhound`

can help.

## Getting startedĀ¶

Obtaining a `idxhound.Selection`

object is straightforward: simply pass a selection object (such as a boolean filter or array of integer indices) as an argument to the constructor. For example, letās create an array and filter it using a boolean selection.

```
>>> x = np.asarray(list('abcdef'))
>>> obj = idxhound.Selection(x > 'c')
>>> y = x[obj]
>>> y
array(['d', 'e', 'f'], dtype='<U1')
```

The indexing behaviour is exactly the same as if weād used `y = x[x > 'c']`

. But `obj`

allows us to track where the elements in `x`

ended up in `y`

. The example below illustrates how to find the index of `x[3]`

in `y`

.

```
>>> i = obj[3]
>>> i, y[i]
(0, 'd')
```

But indexing by an element that has been eliminated by the selection raises an error as one might expect.

```
>>> obj[2]
Traceback (most recent call last):
...
KeyError: 2
```

Using the inverse of `i`

allows us to retrieve the index of an element in `x`

given its index in `y`

.

```
>>> j = obj.inverse[1]
>>> j, x[j], y[1]
(4, 'e', 'e')
```

### Convenience functionsĀ¶

The functions `dict_to_array()`

and `array_to_dict()`

facilitate conversion from dictionaries to arrays and vice versa. This functionality is convenient for loading or saving data with non-integer keys. Suppose we are presented with a dictionary of city populations which we want to convert to an array for manipulation.

```
>>> cities = ['Rome', 'Berlin', 'Paris', 'London']
>>> population = {'Rome': 2.873, 'Berlin': 3.769, 'London': 8.982}
>>> arr = idxhound.dict_to_array(population, idxhound.Selection(cities))
>>> arr
array([2.873, 3.769, nan, 8.982])
```

Converting back to an array yields the following.

```
>>> idxhound.array_to_dict(arr, idxhound.Selection(cities))
{'Rome': 2.873, 'Berlin': 3.769, 'Paris': nan, 'London': 8.982}
```

The two convienence functions are applicable to arrays with an arbitrary number of dimensions.

## Advanced useĀ¶

While the above examples illustrate that `idxhound`

can deliver what was promised, more advanced use cases is where it shines.

### CompositionĀ¶

Suppose we want to reorder and further filter the character sequence `y`

but still keep track of indices across multiple selections. Easy!

```
>>> obj2 = idxhound.Selection([2, 0])
>>> y[obj2]
array(['f', 'd'], dtype='<U1')
```

Letās construct a composite index that has the same effect as the sequential application of selections.

```
>>> composite = obj @ obj2 # use the `compose` method for python < 3.5
>>> z = x[composite]
>>> z
array(['f', 'd'], dtype='<U1')
```

So where did the first element of `z`

occur in `x`

and `y`

, respectively?

```
>>> composite.inverse[0], obj2.inverse[0]
(5, 2)
```

### Non-integer indicesĀ¶

Real data often use labels rather than integer indices (they might even be readable by humans if weāre lucky). Suppose we have a simple dataset of populations of some European cities and we intend to order them.

```
>>> cities = ['Rome', 'Berlin', 'Paris', 'London']
>>> population = [2.873, 3.769, 2.148, 8.982]
>>> mapping = idxhound.Selection(cities)
>>> obj = (mapping @ np.argsort(population))
>>> obj[['London', 'Berlin']]
[3, 2]
```

London and Berlin would end up in last and second to last position in the ordered array, respectively. Indeed, they are the two largest cities. We can also easily retrieve the smallest city.

```
>>> obj.inverse[0]
'Paris'
```

### Named columnsĀ¶

Because `idxhound.Selection`

is agnostic to the dimensions of the tensor being indexed, it can also be used to select named columns.

```
>>> latitude = [41.9028, 52.5200, 48.8566, 51.5074]
>>> longitude = [12.4964, 13.4050, 2.3522, 0.1278]
>>> data = np.transpose([population, latitude, longitude])
>>> columns = idxhound.Selection(['population', 'latitude', 'longitude'])
>>> data[mapping['Berlin'], columns[['latitude', 'longitude']]]
array([52.52 , 13.405])
```

## Properties satisfied by `Selection`

Ā¶

More formally, an `idxhound.Selection`

satisfies the following properties. Let `x`

be a one-dimensional array, `idx`

be a selection that can be applied to `x`

, `y = x[idx]`

, and `obj = idxhound.Selection(idx)`

. Then

indexing by

`obj`

is equivalent to indexing by`idx`

, i.e. all elements of`y`

and`x[obj]`

are equal,`obj[i]`

retrieves the index of the element in`y`

given its index`i`

in`x`

, i.e.`x[i] == y[obj[i]]`

,and, conversely,

`obj.inverse[j]`

retrieves the index of the element in`x`

given its index`j`

in`y`

, i.e.`x[obj.inverse[j]] == y[j]`

.