One of the benefits of spending large amounts of money on a Fujifilm camera, is that there are enough images to create a travel photobook. We thought it would be fun to create a themed cover to the photobook, so that we can distinguish between all the locations and years on our bookshelf.

Here is our effort for the Lake District trip we enjoyed in December 2023, where we travelled from Penrith to Windermere, Ullswater and Blea Tarn.

]]>I have created an interactive visualisation below to explain the underlying concepts and make tools like prophet seem less like magic… For example, try adjusting the coefficients *below* to model weekly seasonality with one of three linear models (dummy, radial basis function or Fourier). For further explanation, see the seasonality section

*This is actually running Python in your browser! Should take about 30 seconds to load, see this previous post or Shinylive*

Before researching for this blog post, I naively assumed that linear regression is restricted to a straight line fit, of the form:

\[y = m \times x + c\]where $y$ is our target variable, $x$ is the feature, $c$ is the intercept and $m$ is the coefficient for $x$.

However, it turns out that the “linear” in linear regression actually refers to the relationship between the target and feature variables, i.e. each feature variable has a single constant coefficient to describe its relationship to $y$. This means that we can rewrite our model in terms of vectors and matrices, allowing us to extend the straight line to many dimensions:

\[\vec{y} = \mathbf{X} \vec{\beta}\]where $\vec{\beta}$ is a vector of $n$ coefficients ($n \times 1$):

\[\vec{\beta} = \begin{pmatrix} \beta_{0} \\ \vdots \\ \beta_{n} \end{pmatrix}\]and $\mathbf{X}$ is a ($ i \times n$) matrix containing the feature variables:

\[\mathbf{X} = \begin{pmatrix} 1 & x^{(0)}_{1} & \cdots & x^{(0)}_{n-1} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x^{(i)}_{1} & \cdots & x^{(i)}_{n-1} \end{pmatrix}\]When we “fit” our model to the data, we are changing the values of the coefficients ($\vec{\beta}$) with the aim of minimising the error between the observed data and our predictions. This is normally represented by the mean squared error (MSE).

With a bit of linear algebra, we can solve this equation analytically, leading to $\hat{\beta}$ minimising the mean squared error:

\[\hat{\beta} = (\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T} \vec{y}\]Later on, we will see how we can avoid overfitting by modifying the loss function to include some form of regularisation.

After fitting our model, the coefficients can instantly tell us the effect of each of our feature variables. For example, we could say that for every $1^{\circ}$ temperature increase we expect the chance of rain to decrease by X amount. Such a simple statement is notoriously hard to make when using more complicated models, like neural networks.

But what if our data shows some structure?

For example, I have created some fake data that has a clear weekly seasonality. How can we possibly model this with a straight line? Well, with some clever feature engineering tricks, we can create a whole host of new features to model complex situations like this.

*Demonstration of dummy variables with the figure at the top of the page*

Using our intuition, we might think it is sensible to try and calculate the contribution from each day of the week. This can be represented by creating a dummy variable for each day, where $x_{\textrm{Monday}}$ is only equal to $1$ on a Monday and $0$ everywhere else. We can then scale this new feature with a coefficient, $\beta_{\textrm{Monday}}$, that represents **the average $y$ on a Monday**.

This new feature can be created with the following code:

```
def dummy(x, start, width = 1):
# repeat every 7 days
x_mod = x % 7
# Create a boolean array where True is set for elements within the specified range
condition = (x_mod >= start) & (x_mod < start + width)
# Convert the boolean array to an integer array (True becomes 1, False becomes 0)
return condition.astype(int)
```

While these dummy variables are useful for demonstrating that coefficients effectively represent the height of each feature, the final result does not look natural. The step-like shape means that we are expecting a significant change as soon as it goes 1 minute past midnight!

To create a smoother seasonality pattern, we can replace the step function with a repeating Gaussian distribution centered around each day of the week. This is referred to as a radial basis function:

\[y = e^{- (x - \textrm{center})/(2 \times \textrm{width})}\]This now lets the influence of a single day seep into adjacent days, leading to a much more pleasing final fit.

We can create the new features with the code below:

```
def rbf(x, width, center):
# repeat every 7 days
x_mod = x % 7
center_mod = center % 7
# Original Gaussian
gauss = np.exp(-((x_mod - center_mod)**2) / (2 * width))
# Gaussian shifted by +7
gauss_plus = np.exp(-((x_mod - (center_mod + 7))**2) / (2 * width))
# Gaussian shifted by -7
gauss_minus = np.exp(-((x_mod - (center_mod - 7))**2) / (2 * width))
# Sum the contributions
return gauss + gauss_plus + gauss_minus
```

You may have noticed that we know have an extra variable `width`

, which controls the width of our individual Gaussian functions. This variable is not actually part of the Linear regression, but is used to generate the features that the model learns from. This is therefore a hyperparameter that we can tune.

For example, try changing the width parameter in the top figure to see how it affects the MSE

We can also model seasonality with Fourier components, which less prone to overfitting than the radial basis functions above.

This trick works because any function can be represented as a sum of infinitely many sine and cosine waves added together, known as a Fourier transform, as demonstrated below.

Therefore, we can create a new feature for each of these sine and cosine waves, with each coefficient representing the amplitude of that individual sine wave. This is equivalent to the coefficients that can be found by performing a Fourier transformation. The only difference is that we cannot have infinitely many sine waves in our linear regression, however, this is actually helpful to avoid overfitting!

The figure below displays the first and second order Fourier components that are used in the interactive figure

What if I don’t trust old data as much as recently recorded data?

Many fitting libraries, such as scikit-learn, allow you to specify an `importance`

to each of the data points. Therefore, we can give exponentially decaying importance to older measurements as a way of ignoring potentially misleading historic data. The amount of “forgetfulness” then becomes another `hyperparameter`

in our model.

what if the seasonality changed over time?

In the figure below, the amplitude of the seasonality component is increasing each week. Not to worry, we can just create several extra features representing the interaction of day of the week with time. However, this kind of feature engineering can become tedious, hindering our flow. We now want to describe our models with a statistical language. For example, a straight line fit ($y = m \times x + c$) would be written as

```
y ~ 1 + x
```

with the `1`

representing the intercept, with a coefficent being automatically gerenated for each variable, `x`

. This might seem like overkill for such a simple model, but the real beauty becomes apparent with more complex problems…

Returning to the changing seasonality problem, we might start by saying there is an interaction between time, `t`

, and the first Fourier cosine component, `cos_1`

:

`y ~ 0 + t*cos_1`

The `*`

operator will automatically generate coefficients for `t`

, `cos_1`

and the interaction between `t:cos_1`

Note: In the notebook, the patsy package is used to convert these statistical formulas into a feature vector

After fitting this formula, we get three coefficients that we can interpret as:

- For every 1 unit of
`t`

, y will reduce by 0.01 (negative slope straight line) - The base amplitude for the
`cos_1`

variable is -0.41 - For every unit of
`t`

this amplitude will reduce by another -0.04 (making the amplitude larger)

Including the full interaction terms is as easy amending the formula like so:

`y ~ t * (cos_1 + sin_1 + cos_2 + sin_2)`

which will autogenerate 9 features and coefficients

However, there is no free lunch in modelling. Although we can now create hundreds of features using this statistical language, we now need to keep control of them…

One of the hardest parts of fitting any model, either simple or complex, is making sure we don’t overfit. We want to capture the general shape of the observed data, but do not want to perfectly predict each point, as this will just be fitting the noise rather than learning the underlying pattern.

One way to combat overfitting is with a technique called regularisation, which adds a penalty term to the regression formula. Since the model will be penalised for each variable it adds, only the most useful features are included. This can be seen by comparing the figures above (regular fit) and below (with regularisation).

The exact form of the penalty can change, with the two main being LASSO and RIDGE regression having a linear and quadratic penalty term, respectively.

An alpha parameter is often used to specify the trade-off between the model’s performance on the training set and its simplicity. So, increasing the alpha value simplifies the model by shrinking the coefficients.

At this point, you might be wondering what are the uncertainties on these coefficients?

While we can solve linear regression with matrix operations (as shown earlier), this is not the best way to understand the sensitivity in our coefficients. Instead, we can fit our model within a Bayesian framework. Not only will this automatically provide an uncertainty estimate in the form of a posterior distribution, but we can actually incorporate domain knowledge into our fitting (through the prior distribution).

To explain how, we first need a quick crash-course in Bayes’ theorem (or read this introduction).

Bayes’ theorem allows us to mathematically update the probability of an event happening, $P(E)$, based on some new data, D.

\[P(E|D) = \frac{P(D |E) \times P(E)}{P(D)}\]For example, we might be playing a game of heads or tails where I have to guess the coin face. I give you the benefit of the doubt and initially believe that there is a 50/50 chance of the coin being fair, $P(Fair) = 0.5$, which is called my `prior`

assumption.

If you get three heads (3H) to start the game, do I still believe the coin is fair?

If the coin is fair, then on each coin flip the probability of getting heads is 50%. So the probability of $n$ heads in a row is equal to

\[P(H|\textrm{Fair}) = 0.5^{n}\]We will assume that if the coin is not fair the probability of $n$ heads is

\[P(H|\textrm{Not Fair}) = 0.75^{n}\]On each coin flip, I can update my beliefs to get a `posterior`

probability, i.e. $P(Event|Data)$. Doing the maths, we can say that:

So, maybe we need to have a chat…

Using this analogy, we start off by assuming a prior distribution for each of our linear regression coefficients (usually just a Gaussian). We can then use Bayes’ theorem to update our beliefs of these coefficients, given the observed data. With each new data point, we get a better understanding of the posterior distribution for each function!

This gets even more fun when you realise we don’t have to use the standard Gaussian priors. We can actually set up the priors to include knowledge we already have about the problem, i.e. we might already have some idea of what the seasonality should be, allowing a fit that might not be normally possible with limited data.

The above figure shows the final coefficients along with the 94% confidence interval. This allows us to quote things as “we are 94% confident that Sunday gives an extra 0.746 to 1.071 units”. We can also make statements such as “we are the least sure about the 7th day”. Interestingly, the model is very confident on the amount of noise in the data (the `y_sigma`

parameter), probably because I made this fake data with a single noise parameter…

The prophet model from Facebook provides many of these functionalities straight out of the box, and does such a good job of abstracting this complexity away that it kind of seems like magic. However, it is important to realise that prophet is just using some of the tricks from earlier:

- Is a linear model
- Models weekly seasonality with radial basis function dummy variables
- Models yearly seasonality with Fourier components (which you can change the order of with the
`yearly_seasonality`

parameter) - Can fit in a Bayesian framework, giving uncertanties in the final predictions

To be fair, prophet also has a few tricks up its sleeves: it can model increases in sales on holidays in that region; handle logistic growth (along with changepoints) as well as deal with gaps in the data.

I would personally use prophet to rapidly experiment with seasonality, holidays and extra regressors. I would then re-create the model in PyMC to add more custom features, see this example of how to do this.

As an demonstration of how simple prophet is, we can condense most of the ideas in this blog post to the following code:

```
m = Prophet(
weekly_seasonality = True,
yearly_seasonality=5
)
m.add_country_holidays(country_name='USA')
m.fit(df)
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail()
```

We can also break down each of these components to extract more information

For a more detailed tutorial, see the official documentation

This post, and associated notebook, is heavily inspired by this fantastic talk by Vincent Warmerdam, along with this notebook investigating cycling patterns in Seattle.

]]>As this article explains, a traditional recommendation algorithm would just say “you should watch these shows”. But Netflix go one step further and ask themselves “how can we convince you that this show is worth watching?”

A simple example of this concept is showing you film posters with actors you recognise:

Instead using a traditional A/B test to trial a new algorithm such as this, the authors describe how they use a contextual bandit algorithm to balance data exploration against providing the best experience for the user.

As they explain, there is a lot of nuance to this problem:

- you need to understand the causal effect of suggesting an artwork, i.e. would a user have clicked that show regardless of your proposed artwork?
- maintaining a recognisable image for that show. What if a user cannot find that artwork they saw last week?
- optimise across the whole screen. A poster of the main actor will not stand if a similar artwork is shown for all shows
- how to quickly learn the rules for a new title launch?

In this 2017 article, the authors say that a team of artists and designers create a wide range of images for each show. It would be interesting to learn how much of this has now been taken by generative AI.

]]>Now, I could be disciplined and make sure the workspace is kept tidy, but frankly that requires a lot of effort (that I simply don’t have the patience for). Besides, the flurry of data exploration and creative problem solving is all part of the process that we should encourage.

As you have probably guessed, some smart *cookie* has already *made* tools to solve this problem… read on to understand just how terrible those puns are.

We will be using two tools to create our reproducible workflow: cookiecutter to stamp out our folder structure and then make our data processing and analysis automated. You can use my cookiecutter with the following (which is based on this original):

```
cookiecutter https://github.com/rlaker/cookiecutter-data-science
```

The first tool, cookiecutter, allows us to define the folder structure and include any boilerplate code that can streamline the process of creating the project. For example, with cookiecutter we can install and set up:

- a conda environment for this project with an
`environment.yaml`

file. Simply run`make environment`

to install some common packages, e.g. pandas or matplotlib - pre-commit hooks to be installed automatically - ensuring our formatting is consistent
- a python package for each project (which lives in the
`src`

folder). This means you can install the custom package into any notebook in the project without having to worry about paths to a local folder (see these blog posts here and here). This also makes the project more portable when it comes to productionising the model. - include custom style files, such as the one from my previous post
- a place to store the raw
`data`

,`notebooks`

,`models`

,`figures`

etc.

As well as helping us remember where all the necessary files are stored, such a consistent folder structure can also allow us to write all the necessary boilerplate in a `makefile`

. Essentially, this file holds the terminal commands needed for make to reproduce the outputs of the project. Not only are makefiles **both human and machine readable documentation**, but they can specify the dependencies for each part of the project, i.e. the Latex report is dependent on the code to make the figures, which depends on the data processing script. By looking at the file timestamps, make can also infer which files need to be updated, only running the necessary parts of the pipeline.

While storing pre-processed versions of data files can save disk space, I learnt the hard way why this is a bad idea. **Do not** manually edit or change the raw data, you will forget what you did in a few weeks/months. Instead, create a pipeline and document the steps in the `makefile`

. Even if you forget any of the steps to your implementation, you only need to run `make data`

.

If you follow convention, then a completely new user could recreate any of your projects by just typing

`make all`

The power of make, combined with a consistent folder structure, can also enable you to automatically create documentation for your project with a single command. For example, running `make docs_html`

in the newly created project will use pdoc to automatically generate a static website from the docstrings within the code!

If you want a more in depth review of this type of workflow:

]]>Who knew that margarine consumption is correlated with the divorce rate in Maine? There is even a *very scientific* paper on the subject.

This is just one of thousands of spurious correlations from Tyler Vigen’s hilarious demonstration of data dredging (his figures are even included on the associated Wikipedia page). This is when you take many variables, say 25,237 like on his website, and blindly accept statistically significant correlations.

Turns out this is a major problem in the more statistical sciences, so much so that they now have a pre-registration format to describe what the study will investigate before any data is investigated.

This project also provides a great example of generating realistic looking content, in the form of *scientific* papers, from LLMs. Each paper shows the sequence of prompts that were used to create it.

The author does point out that:

The silliness of the papers is an artifact of me (1) having fun and (2) acknowledging that realistic-looking AI-generated noise is a real concern for academic research (peer reviews in particular).

The papers could sound more realistic than they do, but I intentionally prompted the model to write papers thatlookreal butsoundsilly.

Although, I’m sure you could convince some people that Anne Hathaway films are responsible for the number of votes for Republican senators…

]]>For example, try adjusting the coefficients *below* to model weekly seasonality. For further explanation, see the main blog post

*This is actually running Python in your browser! Inspect the console for this page and you will see that Python packages have been installed.*

As demonstrated by Shiny’s website, Python packages could run interactive demos of each function in their documentation, without the user actually installing the package. Users could also see how they can modify the example by editing the documentation directly!

The only downside is that only a few select packages have been converted to work with Pyodide, but the data science big hitters are there: numpy, matplotlib and pandas

After writing your shiny app, export it with the shinylive package

```
shinylive export app_folder site
```

You can then serve the site locally with

```
python -m http.server --directory site 8008
```

All we need to do now is upload this folder to our GitHub Pages site. I decided to put it in the files folder, since I do not need to show the `index.html`

app file directly. Instead, I include the app in posts by using an `iframe`

```
<iframe src="/files/shiny_linear_model/shiny_linear_model.html" width="100%" height="1000px" style="border:none;"></iframe>
```

Clearly, I want to still use Python to create my figures, but want to match the professional look of other’s plots. This also means that I won’t get any push back from my graphs being in a different format.

Matplotlib stylesheets provide a way to achieve a consistent styling to your figures, e.g.:

The different elements of the stylesheet are described below, along with some helpful snippets for legends and tick formatting.

This part contains the base seaborn style

```
# Seaborn common parameters
# .15 = dark_gray
# .8 = light_gray
figure.facecolor: white
text.color: .15
axes.labelcolor: .15
legend.frameon: False
legend.numpoints: 1
legend.scatterpoints: 1
xtick.direction: out
ytick.direction: out
xtick.color: .15
ytick.color: .15
axes.axisbelow: True
image.cmap: Greys
font.family: sans-serif
font.sans-serif: Arial, Liberation Sans, DejaVu Sans, Bitstream Vera Sans, sans-serif
grid.linestyle: -
lines.solid_capstyle: round
lines.linewidth : 2
lines.markersize : 10
# Seaborn whitegrid parameters
axes.grid: True
axes.facecolor: white
grid.color: .8
xtick.major.size: 4
ytick.major.size: 0
xtick.minor.size: 2
ytick.minor.size: 0
```

Cycle through the branded colours of my organisation (Trainline):

```
#cycler
axes.prop_cycle: cycler('color', ['00a88f','ff9da1','160078','ffc508','ff6120','004ff9','ac3200'])
```

Set the grid style and labels:

```
# grid
axes.grid.axis: y # which axis the grid should apply to
axes.grid.which: major # grid lines at {major, minor, both} ticks
#font size
font.size : 18
axes.titlesize : 24
figure.titlesize: 24
axes.labelsize : 20
xtick.labelsize : 16
ytick.labelsize : 16
legend.fontsize : 16
# label pad
axes.labelpad: 8.0 # space between label and axis
```

Following the Excel style, only show the bottom spine

```
# spines
axes.spines.left: False # display axis spines
axes.spines.bottom: True
axes.spines.top: False
axes.spines.right: False
```

Set the date format

```
# DATES
date.autoformatter.year: %y
date.autoformatter.month: %m/%y
date.autoformatter.day: %d/%m/%y
date.autoformatter.hour: %m-%d %H
date.autoformatter.minute: %d %H:%M
date.autoformatter.second: %H:%M:%S
date.autoformatter.microsecond: %M:%S.%f
```

Format y axis to have comma separated numbers, e.g. $$100,000$

```
import matplotlib.ticker as ticker
# Just put a , between 000
axs.yaxis.set_major_formatter(ticker.StrMethodFormatter('{x:,.0f}'))
# % symbol
axs.yaxis.set_major_formatter(ticker.StrMethodFormatter('{x:.1f}%'))
# currency
axs.yaxis.set_major_formatter(ticker.StrMethodFormatter('£{x:,.2f}'))
```

or

```
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as plt
def millions(x, pos):
'The two args are the value and tick position'
return f'£{x*1e-6:,.1f}m'
formatter = FuncFormatter(millions)
ax.yaxis.set_major_formatter(formatter)
```

from here

Legend on top of the plot (guide)

```
ax.legend(bbox_to_anchor=(0, 1, 1, 0), loc="lower left", mode="expand", ncol=2)
```

Legend on the right

```
ax.legend(bbox_to_anchor=(1, 1), loc="upper left")
```

*Hover over the interactive plots, made with Plotly*

It turns out the film I value more than most on Letterboxd is `22 Jump Street`

and my most disappointing was `The Terminator`

. Now, comedies are subjective, so I feel no need to defend one of my favourites, but I might get some flack for my Terminator review. In my defence, this film is so important for sci-fi that I knew the whole plot and infamous moments before watching. Maybe its my own fault, but I wasn’t expecting it to be quite so low budget, particularly the mirror scene…

Interestingly, I watched more than twice as many films on streaming services than in the cinema, but which is superior?

From the below figure, the box plots show that I found films on Netflix the highest quality overall (excluding iplayer which only had 3 films). While the cinema had the largest spread in ratings, it did provide the most 5 star films: Barbie, Oppenheimer, Mission: Impossible Dead Reckoning and 2001: A Space Odyssey

Can you guess when I finished my PhD thesis?

While Letterboxd provides some neatly packaged `.csv`

files for users to explore their own data, the API for gathering information about each film is private. Therefore, to compare my ratings with those of the general public, I needed to first find the average rating of each film.

Originally I turned the `Name`

column into a hyphen separated URL, such as https://letterboxd.com/film/no-time-to-die/. However, this referenced a little known film about a hearse driver, rather than the James Bond one I had seen, with the actual URL of https://letterboxd.com/film/no-time-to-die-2021/.

Within the `.csv`

files, Letterboxd provide the url of each of personal film review, such as https://boxd.it/41lX6F. Thankfully, this redirected to the full address like https://letterboxd.com/rlaker/film/no-time-to-die-2021/, which just needed my username removing with the following function:

```
import requests
import pandas as pd
watched = pd.read_csv("watched.csv")
def get_final_url(redirect_url):
try:
response = requests.get(redirect_url)
if response.history:
# Extract final URL after redirection
final_url = response.url
# Remove '/rlaker' part from the URL
final_url = final_url.replace('/rlaker', '')
return final_url
else:
return "No redirection occurred"
except requests.RequestException as e:
return str(e)
watched['Letterboxd_URL'] = watched['Letterboxd URI'].apply(get_final_url)
```

After originally trying to find the right `<span>`

with BeautifulSoup, I realised the data was actually stored in a dictionary, which was then served by Javascript later (as explained in this Reddit answer).

It turned out regular expressions were enough for this task:

```
import requests
import re # Regular expression module
def get_average_rating(url):
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
# Use regular expression to find "ratingValue"
match = re.search(r'"ratingValue":(\d+\.\d+)', response.text)
if match:
return float(match.group(1)) # The first group captures the rating value
else:
return None
else:
return None
except Exception as e:
return str(e)
# Applying the function to each URL
watched['Average_Rating'] = watched['Letterboxd_URL'].apply(get_average_rating)
```

Instead, Shiny (originally written for the R language) asks the user to design the web app from the start, rather than modifying an existing script. This is done by decorating the necessary functions, so that only they will be re-run. This type of reactive programming opens up a range of technical possibilities… The Shiny gallery includes a Wordle clone and ChatGPT playing 20 questions with itself, but mind was particularly blown by the 3D graph demo in this video:

]]>I am aware that a blog may seem outdated in the social media age, so I first wanted to set out my aims and motivations for this new project.

Driven by the fact that I have recently transitioned from academia to data science, my primary motivation is to keep up a consistent streak as a way to learn new concepts. Knowing that these posts will be public, encourages more effort on my part to distill information from many sources into one coherent post. This is something I try and do in my own personal notes, as a sort of documentation for my future (and forgetful) self. Publishing some of these notes will provide more accountability for quality and encourage deeper thinking around concepts, a benefit that has been particularly clear when jotting down my ideas for this post!

As described by Simon Willison, consistent streaks can escalate from a small (and fun) idea into a large body of work, such as a well-renowned blog or a youtube phenomenon. With practice, I hope that the quality of my writing will increase over time, which should also allow me to not just review a restaurant as “quite nice”.

As I have discovered in the beginning of my data science journey, there is a lot of invaluable information written across blogs and articles. Some of my favourites so far:

- Big data of big hair
- Toy example of causal inference
- Why the super rich are inevitable
- Is cycling taking off in Seattle?
- Anything by Vincent Warmerdam

I also appreciate the enormous amount of effort that goes into open source code/projects. Making code and methods public is like working with the garage door up, allowing others to build on previous work. For example, my Python animation was only possible after seeing how Three Blue One Brown made his videos, and finding that this same code was part of a community supported Python package, manim. Therefore, I hope these posts can act as a way to pay it forward.

Obviously, there are career related benefits of writing a blog, with the website acting as a portfolio that proves I have a willingness to learn. I also think it is more fun than some of the more clichéd projects.

I really enjoy the idea of a “today I learned” posts, e.g. Vincent Warmerdam and Simon Willison. These are a great way of having those “I never knew that was possible” moments, and lets people share ideas without needing to write out a full blog post.

Travel/photography blogs allow for a better curated set of images, without having to obey the formatting rules and noise of social media. I also want to learn how to create a more finished product with my photos, rather than letting them sit on my phone.

This is something James Popsys and Roman Fox do really well.

One blog post a month, and one TIL a week. Let’s see how well I do.

]]>