Yes, we're zero indexing!

Git is a version control system that runs on your computer. You can `commit`

changes to the repository and it will save a history of these commits, allowing you to reference them or go back to them at any time. GitHub is a hosting service that essentially backs up your Git repositories (when you `push`

to them or `pull`

from them) so that you can acccess them from anywhere and there is a reduced chance of data loss. You can also check out other people's GitHub repository by `clone`

-ing them.

There are basically two ways to intall Python. One is the "native" way using whatever facilities exist with your operating system (for OSX this would be homebrew), and then using `pip`

to install packages. The other way, which is generally more user-friendly and OSX/Windows friendly is to use the Anaconda distribtuion and `conda`

to install packages.

The most bare-bones way to run Python is to just execute `python`

in the terminal. Depending on your setup, this might actually run Python 2, which is quite old at this point, so it's often safer to run `python3`

explicitly. Anyway, you'll almost never need to do this. At the very least you'll want to run in a more user friendly environment like `IPython`

by executing `ipython3`

, which is sort of a wrapper around `python3`

.

But even then, most of you will prefer to use Jupyterlab. This is a web-based graphical interface that is primary centered on "notebooks", which is what you're looking at now basically. It's just a series of cell (code or markdown) that when run produce some kind of output. Notebooks are a way to store the code you've written and the resulting output in one place.

The other place where Python code might "live" is in Python files, which are just text files with the extension `.py`

. If you have a file named `model.py`

, you can run its contents directly with `python3 model.py`

. If you're in an IPython terminal and you want to run its contents interactively, you can run `run -i model.py`

. Finally, you can use it as a module. In this case you can run the Python command `import model`

and there will be a new variable `model`

that contains any variables defined therein. So if you'd defined `abc = 5`

in `model.py`

then `model.abc`

will be `5`

.

Nowadays, you can do almost everything in Jupyterlab. That can be useful, especially if you're working on a remote machine like the Pitt cluster. However, I must emphasize that you shouldn't be doing everything in notebooks. For suffiently complex code, you'll want to put some portions of it proper Python modules (`.py`

files, basically) and import them for usage in a notebook. Thus your notebook will contain mostly high level commands and their resulting outputs.

You can create a new notebook (`.ipynb`

file) by clicking on the blue "+" on the left and chooseing a Python version (something like 3.9 or higher is recommended). You can also create other file types like Python (`.py`

) or markdown (`.md`

) or open a system terminal. Finally, you can edit any of these files by double clicking on them in the filesystem pane on the left.

You'll want to stick mostly to the keyboard. To run a cell, press `Shift+Enter`

. To run a bunch of cells in a row, just hold down `Shift`

and keep pressing `Enter`

. To enter edit mode on the selected cell, press `Enter`

. To exit edit mode on a cell, just press `Esc`

. To interrupt ongoing execution, press `i`

twice. To completely restart a notebook press`0`

twice. Create new cells above or below with `a`

and `b`

.

You can make a cell into a markdown cell by pressing `m`

. Press `y`

to turn it back into a code cell. In markdown mode, you can make headings with one or more `#`

s, amongst other markdown features such as pairs of `**`

for **bold** text. You can also do inline $\LaTeX$-style math with pairs of `$`

, as in $x^2$, or display style math with pairs of `$$`

, as in
$$ \int_{-\infty}^{\infty} \exp(-x^2) dx = \sqrt{\pi} $$

There are a small number of core data types that are quite powerful. First there's the `tuple`

which is basically a list of objects

In [26]:

```
a = (1, 2, 'abc')
a
```

Out[26]:

When the grouping is not ambiguous, you can omit the parenthases

In [6]:

```
a = 1, 2, 'abc'
a
```

Out[6]:

In the other direction, you can unpack tuples and assign their members to separate variables

In [10]:

```
b, c, d = a
c
```

Out[10]:

You can select subsets of a tuple by slicing them

In [20]:

```
a[1:]
```

Out[20]:

Tuples aren't super flexible. Once you've created them, you can't resassign their elements, though you can append new ones to the end. For more interactive use cases, you'll want to use a `list`

. They look and act a lot like tuples, but you can modify them

In [32]:

```
a = [1, 2, 'abc']
print(a)
a.append(5)
print(a)
a[1] = 10
print(a)
```

There are a couple of fancy operations you can do with lists that use overloaded algebraic operators

In [36]:

```
a = [1, 2, 3]
b = [4, *a, 10]
print(b)
c = a + b
print(c)
d = 3*a
print(d)
```

Here you can see that using `*`

in front of a list variable acts as if you had typed out the contents.

I would say that the `dict`

is *the* quintessential type on Python. They are extremely useful and many things use them. I `dict`

is just a mapping between different objects, from `keys`

to `values`

. The values can be of any type, which they keys are restricted to being "hashable", which includes things like numbers, strings, and tuples (but not lists).

In [37]:

```
d = {1: 2, 'abc': 10, 12: 'foo'}
d
```

Out[37]:

You can access the elements of dictionaries with square brackets

In [42]:

```
d[12]
```

Out[42]:

You can combine dicts as we saw with lists but using `**`

instead.

In [43]:

```
e = {**d, 15: 1}
e
```

Out[43]:

You can loop over iterables like tuples, lists, and other things using for loops.

In [44]:

```
for i in [1, 2, 3, 4]:
print(2*i)
```

There's also something known as a list comprehension that lets you do this in more compact form

In [46]:

```
a = range(5) # generates a list from 0 to 4, inclusive
b = [2*i for i in a]
print(b)
```

We can also do comprehensions on dictionaries

In [65]:

```
a = [1, 2, 3, 4]
{i: 2*i for i in a}
```

Out[65]:

Functions are similar to other programming languages. But they can also be assigned and passed around like variables

In [48]:

```
def add(x, y):
return x + y
add(1, 5)
```

Out[48]:

For smaller functions, you can also use the lambda function notation

In [49]:

```
add = lambda x, y: x + y
add(1, 5)
```

Out[49]:

You can combine multiple iterables together using `zip`

. This turns out to pretty useful

In [51]:

```
a = [1, 2, 3, 4, 5]
b = [10, 11, 12, 13, 14]
zip(a, b)
```

Out[51]:

Ok, that seems less useful. It turns out `zip`

returns an iterator object instead of the real thing. There are good efficiency reasons for this, but to get it to give you the real values, you need to do

In [52]:

```
list(zip(a, b))
```

Out[52]:

There are quite a few built in modules that have useful functions. There are also many third-party modules that we'll use extensively.

In [54]:

```
import re # regular expressions
re.sub(r'\d', 'x', 'My phone number is 123-4567')
```

Out[54]:

Here's an example using `itertools.chain`

which is often useful for chaining iterators together. In addition to `itertools`

, other all-star built-in modules include `operator`

, `functools`

, and `operator`

.

In [47]:

```
from itertools import chain
a = [range(i) for i in range(5)]
print(a)
b = chain(*a)
list(b)
```

Out[47]:

`numpy`

¶In [51]:

```
# it's pronounced num-pie :)
import numpy as np
```

The central object in `numpy`

is an N-dimensional array type `np.ndarray`

. Lots of stuff here is going to be similar to matlab arrays.

In [48]:

```
np.ones(10)
```

Out[48]:

You can create an array from a list with `np.array`

, but the inputs should be numerical

In [49]:

```
np.array([1, 2, 3, 4])
```

Out[49]:

Note that when generating ranges, the left limit is inclusive while the right limit is non-inclusive

In [50]:

```
a = np.arange(10)
print(a)
```

There are a bunch of different ways to slice arrays, much like lists but more powerful.

In [52]:

```
a = np.arange(10)
print(a)
print(a[0]) # zero indexed!
print(a[3:]) # no 'end' needed
print(a[:-1]) # negatives index from end
print(a[3:5]) # second index is non-inclusive
print(a[[4,1,9,2]]) # index with a list
```

We can "broadcast" new dimensions at will. Here we make a column vector. Note that the row dimension is the first index (row-major)

In [54]:

```
a[:, None]
```

Out[54]:

Here's same thing but for a row vector

In [56]:

```
a[None, :]
```

Out[56]:

We can construct complex matrices using indexing and broadcasting. Do this instead of repmat!

In [58]:

```
a[:, None] + a[None, :]
```

Out[58]:

Multiplication is element-wise by default (like .* in matlab)

In [59]:

```
np.arange(10) * np.arange(10)
```

Out[59]:

Broadcasting works for a variety of operators, not just addition.

In [60]:

```
np.arange(10, 20)[None, :]*np.arange(5,15)[:, None]
```

Out[60]:

You can always get shape/size information about an array.

In [61]:

```
a = np.ones((3, 5))
print(a.shape, a.size)
a
```

Out[61]:

Reshaping is a whole thing.

In [77]:

```
a = np.ones((4, 5))
print(a.reshape((10, 2)).shape)
print(a.T.shape)
print(a.flatten().shape)
```

There is basic linear algebra in `numpy`

, but you'll want to see `scipy`

for more advanced operations and for statistical distributions.

In [62]:

```
m = np.random.rand(5, 5)
mi = np.linalg.inv(m)
print(mi)
print((np.dot(m, mi)-np.eye(5)).max())
```

But `numpy`

has many different routines for random number generation.

In [63]:

```
np.random.randint(5, size=10)
```

Out[63]:

`matplotlib`

¶First I'm going to do some non-required stuff to configure graph appearance to my liking

In [65]:

```
import matplotlib as mpl
mpl.style.use('./config/clean.mplstyle') # this loads my personal plotting settings
%config InlineBackend.figure_format = 'retina' # if you have an HD display
```

For most use cases, this is the only import you need. Note that it is a little non-standard.

In [64]:

```
import matplotlib.pyplot as plt
```

First let's do a simple line plot example. You'll usually be passing `numpy`

arrays to `matplotlib`

, but it also accepts lists.

In [67]:

```
plt.plot(np.arange(10, 20), np.arange(10));
```

Another useful plot is the histogram for a given array.

In [68]:

```
plt.hist(np.random.randn(1000));
```

Passing a 2d array will treat each column as a separate series.

In [70]:

```
plt.plot(np.cumsum(np.random.randn(1000, 2), axis=0));
```

`pandas`

¶In [71]:

```
import pandas as pd
```

A `Series`

is a 1-D array with an attached index, which defaults to `range(n)`

.

In [74]:

```
s = pd.Series(np.random.rand(10), index=np.arange(10, 20))
s
```

Out[74]:

Let's look at the underlying data

In [75]:

```
print(s.index)
print(s.values)
```

Or get a quick summary of a numeric series

In [76]:

```
s.describe()
```

Out[76]:

A `DataFrame`

is like a dictionary of `Series`

with a common index

In [77]:

```
df = pd.DataFrame({'ser1': s, 'ser2': np.random.randn(10)})
df.head()
```

Out[77]:

We can get summary stats for each column.

In [78]:

```
df.describe()
```

Out[78]:

This makes plotting much more convenient and powerful.

In [82]:

```
df.plot(title='Random Stuff');
```

Accessing individual columns yields a Series

In [83]:

```
df['ser1']
```

Out[83]:

We can perform vector operations on these

In [85]:

```
df['ser1'] > 0.5
```

Out[85]:

We can select particular rows in this way.

In [87]:

```
df1 = df[(df['ser1']>0.5) & (df['ser2']<1.0)]
df1
```

Out[87]:

`statsmodels`

¶This import is also non-standard. We're going to use the formula based API.

In [89]:

```
import statsmodels.formula.api as smf
```

Gererate some random data with a known causal structure.

In [91]:

```
N = 100
x = np.random.randn(N)
y = 3*np.random.randn(N)
z = 1 + 2*x + 3*y + 4*x*y + np.random.randn(N)
df0 = pd.DataFrame({'x': x, 'y': y, 'z': z})
```

Run an OLS regression with a properly specified model.

In [105]:

```
ret = smf.ols('z ~ 1 + x + y + x:y', data=df0).fit()
ret.summary()
```

Out[105]:

You can access the parameters and standard errors directly as a series and dataframe

In [106]:

```
print(ret.params)
print(ret.cov_params())
```

In [ ]:

```
```