data manipulation pandas

Data selection and manipulation

`loc` and `iloc`

The most useful was is to loc by callable to select rows/columns. As long as the callable returns a list of true/false, it works.

`df.reindex` returns data when index and/or columns presents but returns NaN or custom fillin when not present.

s = pd.Series([1,2,3])
s.reindex([1,2,3]) #this returns [2,3,NaN]

use `index.intersection(other_df.index)` to find common columns with other df

use `iat`/`at` for selecting and setting

`isin` takes both list and dict as arguments to specify columns for df

`where` and `mask`. `mask` is the inverse selection of `where`

df.where(df>5, '?') # return value if > 5, otherwise fill with '?'. fill with NaN if second argument not given

`query`

df[(df['a'] < df['b']) & (df['b'] < df['c'])] is same as df.query('(a < b) & (b < c)')
df.query('index < 2') selects data from row 3. ‘index’ is a special name
df.query('a in b') same as df.loc[df.a.isin(df.b),:]. df.quer('a not in b') is also valid
use case of query. when operations are the same but datasets are different
```
expr = '0.0 <= a <= c <= 0.5'
map(lambda frame: frame.query(expr), [df, df2])
```
lookup query by a pair of rows and columns list. return a list of these values. raise error when one of the labels not found.

Data transformation

Table transpose

of course we are not talking about normal mx transpose….

pivot vs melt
pivot transforms data in different columns into one volumns of label and one column of value. melt is the reverse operation of pivot
pivot_table allows us to have a aggregate function when there are multiple values of that label.
unstack vs stack. which level of index can be selected

stack is a special case of pivot, put column names not in column but into another level of index, forming Multi-index table.
unstack otherwise, is the reverse of stack puting one level of index into column name

group by mapping

groupby({'a': 1, 'b': 2, 'c': 1}) so that ‘a’ and ‘c’ are considered as the same group

Iterate through groups to have some operations…

`groupby().aggregate()`

apply different oeprations on all columns. ```groupby().aggregate([np.mean, ‘min’, max])
apply different operation on different columns by mapping. ```groupby().aggregate({‘col1’: func1, ‘col2: func2})
groupby().transform()

must return a same dimensioned data as the original

groupby().apply()
axis argument indicates how function goes forward. At each step, the function applies to the other axis.
apply method can return ‘aggregate’ like or ‘transform’ like results. It automatically interprets the results but we can also specify the format in result_type arguent
groupby().filtering()
```
def func():
 return boolean #this has to be a boolean value
groupby().filter(func)
```
slicing data

for multi-index

selection
1.1. normal/simple selection

 # df is 3x3 and multi indexed with 3 levels in index and column
 df.loc[(slice(None),['a', 'b'],), (,,,)]

1.2. pd.IndexSlice for loc like selection. can use range and no more slice()

 idx = pd.IndexSlice()
 df.loc[idx['a':'c', :, :], idx[:,:,:]]

Multi-index

axis means both column and index are possible

DF methods to set;reset/drop/rename axis.

set;reset
1.1. reindex(length of axis and data can differ. if axis longer, add nan value, otherwise truncate data like loc)
1.2. set_index <-> reset_index(consumes a column)
1.3. set_axis: length of axis must be same as data dimension
rename
droplevel
index manipulation

Series has the most of the methods like axis. If axis needs manipulation, firstly extract the axis, then transform and finally set them as new axis.
to_flat_index; transform axis to list of tuples
to_frame; transform axis to dataframe
reorder_levels

data manipulation pandas

Data selection and manipulation

`loc` and `iloc`

`df.reindex` returns data when index and/or columns presents but returns NaN or custom fillin when not present.

use `index.intersection(other_df.index)` to find common columns with other df

use `iat`/`at` for selecting and setting

`isin` takes both list and dict as arguments to specify columns for df

`where` and `mask`. `mask` is the inverse selection of `where`

`query`

`lookup` query by a pair of rows and columns list. return a list of these values. raise error when one of the labels not found.

Data transformation

Table transpose

`pivot` vs `melt`

`unstack` vs `stack`. which level of index can be selected

group by mapping

Iterate through groups to have some operations…

`groupby().aggregate()`

`groupby().transform()`

`groupby().apply()`

`groupby().filtering()`

slicing data

for multi-index

Multi-index

DF methods to set;reset/drop/rename axis.

index manipulation

Series has the most of the methods like axis. If axis needs manipulation, firstly extract the axis, then transform and finally set them as new axis.

Data selection and manipulation

loc and iloc

df.reindex returns data when index and/or columns presents but returns NaN or custom fillin when not present.

use index.intersection(other_df.index) to find common columns with other df

use iat/at for selecting and setting

isin takes both list and dict as arguments to specify columns for df

where and mask. mask is the inverse selection of where

query

lookup query by a pair of rows and columns list. return a list of these values. raise error when one of the labels not found.

Data transformation

Table transpose

pivot vs melt

unstack vs stack. which level of index can be selected

group by mapping

Iterate through groups to have some operations…

groupby().aggregate()

groupby().transform()

groupby().apply()

groupby().filtering()

slicing data

for multi-index

Multi-index

DF methods to set;reset/drop/rename axis.

index manipulation

Series has the most of the methods like axis. If axis needs manipulation, firstly extract the axis, then transform and finally set them as new axis.

`loc` and `iloc`

`df.reindex` returns data when index and/or columns presents but returns NaN or custom fillin when not present.

use `index.intersection(other_df.index)` to find common columns with other df

use `iat`/`at` for selecting and setting

`isin` takes both list and dict as arguments to specify columns for df

`where` and `mask`. `mask` is the inverse selection of `where`

`query`

`lookup` query by a pair of rows and columns list. return a list of these values. raise error when one of the labels not found.

`pivot` vs `melt`

`unstack` vs `stack`. which level of index can be selected

`groupby().aggregate()`

`groupby().transform()`

`groupby().apply()`

`groupby().filtering()`