W3 Lab: Perception¶

In this lab, we will learn basic usage of pandas library and then perform a small experiment to test the perception of length and area.

In [ ]:

            
                Copied!
                
import pandas as pd
import math
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import math
import matplotlib.pyplot as plt
%matplotlib inline

Vega datasets¶

Before going into the perception experiment, let's first talk about some handy datasets that you can play with.

It's nice to have clean datasets handy to practice data visualization. There is a nice small package called vega-datasets, from the altair project.

You can install the package by running

$ pip install vega-datasets

or

$ pip3 install vega-datasets

Once you install the package, you can import and see the list of datasets:

In [ ]:

            
                Copied!
                
from vega_datasets import data

data.list_datasets()
from vega_datasets import data

data.list_datasets()

Out[ ]:

['7zip',
 'airports',
 'anscombe',
 'barley',
 'birdstrikes',
 'budget',
 'budgets',
 'burtin',
 'cars',
 'climate',
 'co2-concentration',
 'countries',
 'crimea',
 'disasters',
 'driving',
 'earthquakes',
 'ffox',
 'flare',
 'flare-dependencies',
 'flights-10k',
 'flights-200k',
 'flights-20k',
 'flights-2k',
 'flights-3m',
 'flights-5k',
 'flights-airport',
 'gapminder',
 'gapminder-health-income',
 'gimp',
 'github',
 'graticule',
 'income',
 'iowa-electricity',
 'iris',
 'jobs',
 'la-riots',
 'londonBoroughs',
 'londonCentroids',
 'londonTubeLines',
 'lookup_groups',
 'lookup_people',
 'miserables',
 'monarchs',
 'movies',
 'normal-2d',
 'obesity',
 'points',
 'population',
 'population_engineers_hurricanes',
 'seattle-temps',
 'seattle-weather',
 'sf-temps',
 'sp500',
 'stocks',
 'udistrict',
 'unemployment',
 'unemployment-across-industries',
 'us-10m',
 'us-employment',
 'us-state-capitals',
 'weather',
 'weball26',
 'wheat',
 'world-110m',
 'zipcodes']

or you can work with only smaller, local datasets.

In [ ]:

            
                Copied!
                
from vega_datasets import local_data
local_data.list_datasets()
from vega_datasets import local_data
local_data.list_datasets()

Out[ ]:

['airports',
 'anscombe',
 'barley',
 'burtin',
 'cars',
 'crimea',
 'driving',
 'iowa-electricity',
 'iris',
 'la-riots',
 'seattle-temps',
 'seattle-weather',
 'sf-temps',
 'stocks',
 'us-employment',
 'wheat']

Ah, we have the anscombe data here! Let's see the description of the dataset.

In [ ]:

            
                Copied!
                
local_data.anscombe.description
local_data.anscombe.description

Out[ ]:

"Anscombe's Quartet is a famous dataset constructed by Francis Anscombe [1]_. Common summary statistics are identical for each subset of the data, despite the subsets having vastly different characteristics."

Anscombe's quartet dataset¶

How does the actual data look like? Very conveniently, calling the dataset returns a Pandas dataframe for you.

In [ ]:

            
                Copied!
                
df = local_data.anscombe()
df.head()
df = local_data.anscombe()
df.head()

Out[ ]:

	Series	X	Y
0	I	10	8.04
1	I	8	6.95
2	I	13	7.58
3	I	9	8.81
4	I	11	8.33

Q1: can you draw a scatterplot of the dataset "I"? You can filter the dataframe based on the Series column and use scatter function that you used for the Snow's map.

In [ ]:

            
                Copied!
                
# TODO: put your code here
# TODO: put your code here

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x117da54a8>

Some histograms with pandas¶

Let's look at a slightly more complicated dataset.

In [ ]:

            
                Copied!
                
car_df = local_data.cars().astype({'Year':'object'})
car_df.head()
car_df = local_data.cars().astype({'Year':'object'})
car_df.head()

Out[ ]:

	Acceleration	Cylinders	Displacement	Horsepower	Miles_per_Gallon	Name	Origin	Weight_in_lbs	Year
0	12.0	8	307.0	130.0	18.0	chevrolet chevelle malibu	USA	3504	1970-01-01
1	11.5	8	350.0	165.0	15.0	buick skylark 320	USA	3693	1970-01-01
2	11.0	8	318.0	150.0	18.0	plymouth satellite	USA	3436	1970-01-01
3	12.0	8	304.0	150.0	16.0	amc rebel sst	USA	3433	1970-01-01
4	10.5	8	302.0	140.0	17.0	ford torino	USA	3449	1970-01-01

Pandas provides useful summary functions. It identifies numerical data columns and provides you with a table of summary statistics.

In [ ]:

            
                Copied!
                
car_df.describe()
car_df.describe()

Out[ ]:

	Acceleration	Cylinders	Displacement	Horsepower	Miles_per_Gallon	Weight_in_lbs
count	406.000000	406.000000	406.000000	400.000000	398.000000	406.000000
mean	15.519704	5.475369	194.779557	105.082500	23.514573	2979.413793
std	2.803359	1.712160	104.922458	38.768779	7.815984	847.004328
min	8.000000	3.000000	68.000000	46.000000	9.000000	1613.000000
25%	13.700000	4.000000	105.000000	75.750000	17.500000	2226.500000
50%	15.500000	4.000000	151.000000	95.000000	23.000000	2822.500000
75%	17.175000	8.000000	302.000000	130.000000	29.000000	3618.250000
max	24.800000	8.000000	455.000000	230.000000	46.600000	5140.000000

If you ask to draw a histogram, you get all of them. :)

In [ ]:

            
                Copied!
                
car_df.hist()
car_df.hist()

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1180a5f28>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11811a6d8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11814f7f0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1181749b0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1181da278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1181fd7f0>]],
      dtype=object)

Well this is too small. You can check out the documentation and change the size of the figure.

Q2: by consulting the documentation, can you make the figure larger so that we can see all the labels clearly? And then make the layout 2 x 3 not 3 x 2, then change the number of bins to 20?

In [ ]:

            
                Copied!
                
# TODO: put your code here
# TODO: put your code here

Your own psychophysics experiment!¶

Let's do an experiment! The procedure is as follows:

Generate a random number (a float) between [1.0, 10.0];
Use a horizontal bar to represent the number, i.e., the length of the bar is equal to the number;
Guess the length of the bar by comparing it to two other bars with length 1 and 10 respectively;
Store your guess (perceived length) and actual length to two separate lists;
Repeat the above steps many times;
How does the perception of length differ from that of area?.

First, let's define the length of a short and a long bar. We also create two empty lists to store perceived and actual length.

In [ ]:

            
                Copied!
                
import random
import time
import numpy as np

l_short_bar = 1
l_long_bar = 10

perceived_length_list = []
actual_length_list = []
import random
import time
import numpy as np

l_short_bar = 1
l_long_bar = 10

perceived_length_list = []
actual_length_list = []

Perception of length¶

Let's run the experiment.

The random module in Python provides various random number generators, and the random.uniform(a,b) function returns a floating point number in [a,b].

We can plot horizontal bars using the pyplot.barh() function. Using this function, we can produce a bar graph that looks like this:

In [ ]:

            
                Copied!
                
mystery_length = random.uniform(1, 10)  # generate a float between 1.0 and 10.0. this is the *actual* length.

plt.barh(np.arange(3), [l_short_bar, mystery_length, l_long_bar], align='center')
plt.yticks(np.arange(3), ('1', '?', '10'))
plt.xticks([]) # no hint!
mystery_length = random.uniform(1, 10)  # generate a float between 1.0 and 10.0. this is the *actual* length.

plt.barh(np.arange(3), [l_short_bar, mystery_length, l_long_bar], align='center')
plt.yticks(np.arange(3), ('1', '?', '10'))
plt.xticks([]) # no hint!

Out[ ]:

([], <a list of 0 Text xticklabel objects>)

Btw, np.arange is used to create a simple integer list [0, 1, 2].

In [ ]:

            
                Copied!
                
np.arange(3)
np.arange(3)

Out[ ]:

array([0, 1, 2])

Now let's define a function to perform the experiment once. When you run this function, it picks a random number between 1.0 and 10.0 and show the bar chart. Then it asks you to input your estimate of the length of the middle bar. It then saves that number to the perceived_length_list and the actual answer to the actual_length_list.

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
def run_exp_once():
    mystery_length = random.uniform(1, 10)  # generate a float between 1.0 and 10.0. 

    plt.barh(np.arange(3), [l_short_bar, mystery_length, l_long_bar], height=0.5, align='center')
    plt.yticks(np.arange(3), ('1', '?', '10'))
    plt.xticks([]) # no hint!
    plt.show()
    
    try:
        perceived_length_list.append( float(input()) )
    except:
        print("This should only fail in workflow. If you are running this in browser, this won't fail.")
        pass
    actual_length_list.append(mystery_length)
def run_exp_once():
    mystery_length = random.uniform(1, 10)  # generate a float between 1.0 and 10.0. 

    plt.barh(np.arange(3), [l_short_bar, mystery_length, l_long_bar], height=0.5, align='center')
    plt.yticks(np.arange(3), ('1', '?', '10'))
    plt.xticks([]) # no hint!
    plt.show()
    
    try:
        perceived_length_list.append( float(input()) )
    except:
        print("This should only fail in workflow. If you are running this in browser, this won't fail.")
        pass
    actual_length_list.append(mystery_length)

In [ ]:

            
                Copied!
                
run_exp_once()
run_exp_once()

2.5

Now, run the experiment many times to gather your data. Check the two lists to make sure that you have the proper dataset. The length of the two lists should be the same.

In [ ]:

            
                Copied!
                
# TODO: Run your experiment many times here
# TODO: Run your experiment many times here

Plotting the result¶

Now we can draw the scatter plot of perceived and actual length. The matplotlib's scatter() function will do this. This is the backend of the pandas' scatterplot. Here is an example of how to use scatter:

In [ ]:

            
                Copied!
                
plt.scatter(x=[1,5,10], y=[1,10, 5])
plt.scatter(x=[1,5,10], y=[1,10, 5])

Out[ ]:

<matplotlib.collections.PathCollection at 0x10697d668>

Q3: Now plot your result using the scatter() function. You should also use plt.title(), plt.xlabel(), and plt.ylabel() to label your axes and the plot itself.

In [ ]:

            
                Copied!
                
# TODO: put your code here
# TODO: put your code here

Out[ ]:

Text(0, 0.5, 'Perceived')

After plotting, let's fit the relation between actual and perceived lengths using a polynomial function. We can easily do it using curve_fit(f, x, y) in Scipy, which is to fit $x$ and $y$ using the function f. In our case, $f = a*x^b +c$. For instance, we can check whether this works by creating a fake dataset that follows the exact form:

In [ ]:

            
                Copied!
                
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.power(x, b) + c

x = np.arange(20)  # [0,1,2,3, ..., 19]
y = np.power(x, 2) # [0,1,4,9, ... ]

popt, pcov = curve_fit(func, x, y)
print('{:.2f} x^{:.2f} + {:.2f}'.format(*popt))
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.power(x, b) + c

x = np.arange(20)  # [0,1,2,3, ..., 19]
y = np.power(x, 2) # [0,1,4,9, ... ]

popt, pcov = curve_fit(func, x, y)
print('{:.2f} x^{:.2f} + {:.2f}'.format(*popt))

1.00 x^2.00 + 0.00

In order to plot the function to check the relationship between the actual and perceived lenghts, you can use two variables x and y to plot the relationship where x equals to a series of continuous numbers. For example, if your x axis ranges from 1 to 9 then the variable x could be equal to np.linspace(1, 10, 50). The variable y will contain the equation that you get from popt. For example, if you get equation 1.00 x^2.00 + 0.00 then the variable y would be equal to 1.0 * x**2.0 + 0.

After assigning x and y variables you will plot them in combination with the scatter plot of actual and perceived values to check if you get a linear relationship or not.

Q4: Now fit your data! Do you see roughly linear relationship between the actual and the perceived lengths? It's ok if you don't!

In [ ]:

            
                Copied!
                
# TODO: your code here
# TODO: your code here

Perception of area¶

Similar to the above experiment, we now represent a random number as a circle, and the area of the circle is equal to the number.

First, calculate the radius of a circle from its area and then plot using the Circle() function. plt.Circle((0,0), r) will plot a circle centered at (0,0) with radius r.

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
n1 = 0.005
n2 = 0.05

radius1 = np.sqrt(n1/np.pi) # area = pi * r * r
radius2 = np.sqrt(n2/np.pi)
random_radius = np.sqrt(n1*random.uniform(1,10)/np.pi)

plt.axis('equal')
plt.axis('off')
circ1 = plt.Circle( (0,0),         radius1, clip_on=False )
circ2 = plt.Circle( (4*radius2,0), radius2, clip_on=False )
rand_circ = plt.Circle((2*radius2,0), random_radius, clip_on=False )

plt.gca().add_artist(circ1)
plt.gca().add_artist(circ2)
plt.gca().add_artist(rand_circ)
n1 = 0.005
n2 = 0.05

radius1 = np.sqrt(n1/np.pi) # area = pi * r * r
radius2 = np.sqrt(n2/np.pi)
random_radius = np.sqrt(n1*random.uniform(1,10)/np.pi)

plt.axis('equal')
plt.axis('off')
circ1 = plt.Circle( (0,0),         radius1, clip_on=False )
circ2 = plt.Circle( (4*radius2,0), radius2, clip_on=False )
rand_circ = plt.Circle((2*radius2,0), random_radius, clip_on=False )

plt.gca().add_artist(circ1)
plt.gca().add_artist(circ2)
plt.gca().add_artist(rand_circ)

Out[ ]:

<matplotlib.patches.Circle at 0x101a448208>

Let's have two lists for this experiment.

In [ ]:

            
                Copied!
                
perceived_area_list = []
actual_area_list = []
perceived_area_list = []
actual_area_list = []

And define a function for the experiment.

In [ ]:

            
                Copied!
                
                    
                    
                
                

        
def run_area_exp_once(n1=0.005, n2=0.05):    
    radius1 = np.sqrt(n1/np.pi) # area = pi * r * r
    radius2 = np.sqrt(n2/np.pi)
    
    mystery_number = random.uniform(1,10)
    random_radius = np.sqrt(n1*mystery_number/math.pi)

    plt.axis('equal')
    plt.axis('off')
    circ1 = plt.Circle( (0,0),         radius1, clip_on=False )
    circ2 = plt.Circle( (4*radius2,0), radius2, clip_on=False )
    rand_circ = plt.Circle((2*radius2,0), random_radius, clip_on=False )
    plt.gca().add_artist(circ1)
    plt.gca().add_artist(circ2)
    plt.gca().add_artist(rand_circ)    
    plt.show()
    
    perceived_area_list.append( float(input()) )
    actual_area_list.append(mystery_number)
def run_area_exp_once(n1=0.005, n2=0.05):    
    radius1 = np.sqrt(n1/np.pi) # area = pi * r * r
    radius2 = np.sqrt(n2/np.pi)
    
    mystery_number = random.uniform(1,10)
    random_radius = np.sqrt(n1*mystery_number/math.pi)

    plt.axis('equal')
    plt.axis('off')
    circ1 = plt.Circle( (0,0),         radius1, clip_on=False )
    circ2 = plt.Circle( (4*radius2,0), radius2, clip_on=False )
    rand_circ = plt.Circle((2*radius2,0), random_radius, clip_on=False )
    plt.gca().add_artist(circ1)
    plt.gca().add_artist(circ2)
    plt.gca().add_artist(rand_circ)    
    plt.show()
    
    perceived_area_list.append( float(input()) )
    actual_area_list.append(mystery_number)

Q5: Now you can run the experiment many times, plot the result, and fit a power-law curve!

In [ ]:

            
                Copied!
                
# TODO: put your code here. You can use multiple cells.
# TODO: put your code here. You can use multiple cells.

In [ ]:

What is your result? How are the exponents different from each other?

In [ ]: