Exploratory Visualizations part-3 - Styling matplotlib

In the first two posts of this series I did a compare and contrast of out-of-the-box look and feel of Matplotlib and compared it with base R graphics and the ever so awesome ggplot2. In this post, I'll explore some options to make matplotlib plots prettier, or in other words look more like ggplot2.

Let's get right to it by importing the required packages.

Read more…

Exploratory Visualizations part-2 - Decoding ggplot2

In the first post of this series we looked at the out-of-the-box visualizations possible with matplotlib (here) and the base R graphics package and drew a comparison. In this post, let's take a look at the ever-so-awesome ggplot2.

ggplot2 is invaluable for its sophistication and the way it enables you to write complex plots in just a few lines of code.

Let's get right to it by importing the required packages.

Read more…

Exploratory Visualizations part-1 - matplotlib vs base R

In these series of posts, I will try to visually compare and contrast visualization tools focusing specifically on Python and R. We will look at a wide array of tools such as matplotlib, base graphics in R, lattice, ggplot2 and visually pit them against each other by creating some simple visualizations. Later we will turn our attention to matplotlib and implement some ideas to make its visualizations more impressive. This isn't meant to be a tutorial (there's plenty out there) but hopefully there's a trick or two along the way that's helpful.

In this first post, we will begin by comparing matplotlib with the base graphics package in R.

To start with we will use a simple dataset that provides specifications for 428 new vehicles for the year 2004. This dataset has been used for a number of statistics courses since the results are easier for everyone to relate to. We however will use the data to focus on the tools rather than focus on the data itself.

Let's go!

Read more…

Visually differentiating PCA and Linear Regression

I've always been fascinated by the concept of PCA. Considering its wide range of applications and how inherently mathematical the idea is, I feel PCA is one of the pillars of the intersection between Pure Mathematics and Real-world analytics. Besides, the fact that you could think about real data as just raw numbers and then transform it down to something you can visualize and relate to, is extremely powerful and essential in any learning process.

Just in case you're wondering, Principle Component Analysis (PCA) simply put is a dimensionality reduction technique that can find the combinations of variables that explain the most variance. So you can transform a 1000-feature dataset into 2D so you can visualize it in a plot or you could bring it down to x features where x<<1000 while preserving most of the variance in the data. I've previously explored Facial image compression and reconstruction using PCA using scikit-learn.

In this post I would like to delve into the concept of linearity in Principal Component Analysis.

Read more…