Parking Utilization at CVG

I recently booked a flight from the Cincinnati airport. To find out more about the parking options there, I visited their homepage, which I found is extremely well designed. Quick access to the four most needed items – Flight Status, Parking Availability, Security Times, and the Weather.

They have done their homework well.


The parking availability numbers intrigued me. How easy would it be for me to understand how available parking varies over the weekdays? What about the weekends? How full does the parking really get?

I challenged myself to write up a quick piece of code to find out. And my constraint was – spend no more than 60 minutes doing so. Given the power of R, it was really do-able. Here’s how I solved this problem:

  1. I leveraged the XML library in R to extract the data from the website. The xpathSApply() function does a wonderful job of extracting the right information.How did I generate that regular-expression? Easy! Use SelectorGadget!
    Here’s the code:

    count <- xpathSApply(htmlCode,'//*[contains(concat( " ", @class, " " ), concat( " ", "parking-number", " " ))]',xmlValue)
  2. Calculate the % utilization for the Terminal Garage parking and the CVG ValuPark options. Write these to a file, stored on the disk.
  3. This’ll give us one datapoint in time. How do we generate datapoints over time? Again, super easy leveraging Rscript and crontab.Rscript lets a user run a .R script file using the command line interface, while crontab is a unix based application that allows scheduling and repeated runs of tasks in the background.The code to be run in terminal (on Mac). crontab -l lists the currently scheduled tasks and their parameters in the crontab file. (Run crontab -e for the first time, to edit crontab). */5 tells the program to run the Rscript command every 5 minutes.
    Rahuls-MacBook-Pro:~ Rahul$ crontab -l
    */5 * * * * /usr/local/bin/Rscript /Users/Rahul/Documents/Data\ Science/CVG_Parking/cvg_park.R >/dev/null 2>&1
  4. I let the script run on my laptop for a week, collected a total of 1908 datapoints.
  5. To analyze the data, a small script utilizing the ggplot2, dplyr, tidyr and scales packages makes life easy!

Github stores the codes for both the files.

The Results

There’s a big difference in the way the Terminal Garage is used, as compared to the ValuPark parking spaces. ValuPark is the cheaper of the two. It costs $8 / day and requires you to take a 4 min shuttle to the terminal. The Terminal Garage, on the other hand, costs $15 / day and is right adjacent to the terminal.

As a result, you’ll expect ValuPark to be used as a parking space for longer duration round-trips. You’ll expect less variation hour-over-hour in a day, and fairly constant mean over time. Especially, in the week that I’ve collected data, since it wasn’t a long weekend, or a national holiday. During these cases, I expect ValuPark utilization to increase.

The Terminal Garage should see the highest proportion of variation hour-over-hour since it’s likely used for shorter trips (1 day to a few days…). You’ll also expect that utilization is higher during the weekdays that the weekends.

> data.tidy %>% group_by(Parameter) %>% summarise(Avg=mean(Value),StdDev=sd(Value),COV=StdDev/Avg,N=n())
Source: local data frame [2 x 5]
 Parameter                  Avg      StdDev   COV        N
 (chr)                      (dbl)    (dbl)    (dbl)      (int)
1 TerminalGarageUtilization 51.78788 11.11374 0.21460123 1908
2 ValuParkUtilization       58.58957 3.04346  0.05194543 1908

The average utilization for ValuPark – 58.6% – is fairly constant, with cyclic hourly variations each day, standard deviation of 3%. The COV for ValuPark is a small 5.2%. On the other hand, Terminal Garage has distinctly a different pattern for the weekend vs the weekdays. Large swings result in a standard deviation of 11.1% over the week, with a large COV of 21.5%. Utilization ramps up Mon-Tue, and ramps down Thu-Fri.

Graphically, this is easy to see…

 labs(title='Solid=Terminal Garage, Dashed=ValuPark',y='% Utilization')+
 scale_x_datetime(date_breaks = '1 day', date_minor_breaks = '6 hours',
 labels = date_format("%a, 03/%d",tz = "EST"))+
 theme(axis.text.x = element_text(hjust = 0,size = 9))


Another interesting way to look at this data is using Violin plots. Each color is a day of the week. Terminal Garage on the left shows large variations over the week, as well as during the weekdays. ValuPark is much tighter, and look like small variations of an Earth Spacedock.

ggplot(data.tidy)+geom_violin(aes(x = Parameter,y = Value,fill=Day),scale = 'area')+ labs(title='Variation in parking utilization',x='',y='% Utilization')+ theme(axis.text.x = element_text(size = 10))


If we look at the data by plotting Time on the x-axis (this requires a bit of manipulation of the data… refer to the code that develops data.tidy), we’ll notice the ramp up in parking utilization every morning starting ~4:30a-5:00a. Parking utilization peaks at ~3:00p for both parking lots, and then begins to decline.


How is this useful?

Well, if I were CVG, I’d monitor parking over a longer period of time and:

  1. Develop a few predictive models to anticipate parking load over weekdays and weekends, long weekends, christmas and new years eve etc.
  2. Use this knowledge to tune the number and frequency of operation of buses or golf carts that ply between the garages and the terminal.
    • Leverage this to not only reduce operating costs where possible, but also,
    • Increase customer satisfaction in time periods where utilization changes rapidly.
Skills I picked up in this project – xpathSApply(), SelectorGadget, Rscript, crontab, further knowledge of POSIXct, scale_x_datetime(), annotate(), and format()

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on March 10, 2016 by in Analytics, DataIsBeautiful and tagged , , , , .


Blogs I Follow

Follow BRAIN DUMP on

Data Science, Machine Learning, & Visualization

The Blog of Author Tim Ferriss

Tim Ferriss's 4-Hour Workweek and Lifestyle Design Blog

Celestial events for viewing pleasure

- and what up there in the sky

a paper bird

Un pajaro de papel en el pecho / Dice que el tiempo de los besos no ha llegado

Dan Ariely

My Irrational Life

TED Blog

The TED Blog shares interesting news about TED, TED Talks video, the TED Prize and more.


A journey through information technologies


Mathematical and statistical insights into Formula 1


Fresh hacks every day

Speaking of Robots

Not Your Father's Space Sprockets

%d bloggers like this: