– RAHUL SANGOLE
I recently booked a flight from the Cincinnati airport. To find out more about the parking options there, I visited their homepage, which I found is extremely well designed. Quick access to the four most needed items – Flight Status, Parking Availability, Security Times, and the Weather.
They have done their homework well.
The parking availability numbers intrigued me. How easy would it be for me to understand how available parking varies over the weekdays? What about the weekends? How full does the parking really get?
I challenged myself to write up a quick piece of code to find out. And my constraint was – spend no more than 60 minutes doing so. Given the power of R, it was really do-able. Here’s how I solved this problem:
count <- xpathSApply(htmlCode,'//*[contains(concat( " ", @class, " " ), concat( " ", "parking-number", " " ))]',xmlValue)
Rahuls-MacBook-Pro:~ Rahul$ crontab -l */5 * * * * /usr/local/bin/Rscript /Users/Rahul/Documents/Data\ Science/CVG_Parking/cvg_park.R >/dev/null 2>&1
Github stores the codes for both the files.
There’s a big difference in the way the Terminal Garage is used, as compared to the ValuPark parking spaces. ValuPark is the cheaper of the two. It costs $8 / day and requires you to take a 4 min shuttle to the terminal. The Terminal Garage, on the other hand, costs $15 / day and is right adjacent to the terminal.
As a result, you’ll expect ValuPark to be used as a parking space for longer duration round-trips. You’ll expect less variation hour-over-hour in a day, and fairly constant mean over time. Especially, in the week that I’ve collected data, since it wasn’t a long weekend, or a national holiday. During these cases, I expect ValuPark utilization to increase.
The Terminal Garage should see the highest proportion of variation hour-over-hour since it’s likely used for shorter trips (1 day to a few days…). You’ll also expect that utilization is higher during the weekdays that the weekends.
> data.tidy %>% group_by(Parameter) %>% summarise(Avg=mean(Value),StdDev=sd(Value),COV=StdDev/Avg,N=n()) Source: local data frame [2 x 5] Parameter Avg StdDev COV N (chr) (dbl) (dbl) (dbl) (int) 1 TerminalGarageUtilization 51.78788 11.11374 0.21460123 1908 2 ValuParkUtilization 58.58957 3.04346 0.05194543 1908
The average utilization for ValuPark – 58.6% – is fairly constant, with cyclic hourly variations each day, standard deviation of 3%. The COV for ValuPark is a small 5.2%. On the other hand, Terminal Garage has distinctly a different pattern for the weekend vs the weekdays. Large swings result in a standard deviation of 11.1% over the week, with a large COV of 21.5%. Utilization ramps up Mon-Tue, and ramps down Thu-Fri.
Graphically, this is easy to see…
ggplot(data.cvg)+ geom_line(aes(x=Date,y=TerminalGarageUtilization,color=Day))+ geom_line(aes(x=Date,y=ValuParkUtilization,color=Day),lty=2)+ scale_color_manual(values=c('coral3','chartreuse3','dodgerblue3','deeppink3','darkorchid3','goldenrod3','limegreen'))+ labs(title='Solid=Terminal Garage, Dashed=ValuPark',y='% Utilization')+ theme_light()+ scale_x_datetime(date_breaks = '1 day', date_minor_breaks = '6 hours', labels = date_format("%a, 03/%d",tz = "EST"))+ theme(axis.text.x = element_text(hjust = 0,size = 9))
Another interesting way to look at this data is using Violin plots. Each color is a day of the week. Terminal Garage on the left shows large variations over the week, as well as during the weekdays. ValuPark is much tighter, and look like small variations of an Earth Spacedock.
ggplot(data.tidy)+geom_violin(aes(x = Parameter,y = Value,fill=Day),scale = 'area')+ labs(title='Variation in parking utilization',x='',y='% Utilization')+ theme(axis.text.x = element_text(size = 10))
If we look at the data by plotting Time on the x-axis (this requires a bit of manipulation of the data… refer to the code that develops data.tidy), we’ll notice the ramp up in parking utilization every morning starting ~4:30a-5:00a. Parking utilization peaks at ~3:00p for both parking lots, and then begins to decline.
Well, if I were CVG, I’d monitor parking over a longer period of time and:
Data Science, Machine Learning & Visualization
Tim Ferriss's 4-Hour Workweek and Lifestyle Design Blog
- and what up there in the sky
Un pajaro de papel en el pecho / Dice que el tiempo de los besos no ha llegado
My Irrational Life
The TED Blog shares interesting news about TED, TED Talks video, the TED Prize and more.
A journey through information technologies
Mathematical and statistical insights into Formula 1
Fresh hacks every day
Not Your Father's Space Sprockets