Driven by Data

ABOUT KEVIN COULTER

Photo of Kevin Coulter
Hello, my name is Kevin Coulter. I enjoy exploring data and extracting insights. My goal is to help people achieve success by leveraging data.

I am a data analyst living in Fort Myers, Florida. I recently completed the rigorous hands-on Google Data Analytics Certification. I earned a Bachelor of Science degree in Computer Science from Saint Xavier University.

My technical skills are:
  • *Data analysis
  • *Data cleaning
  • *Data visualization
  • *R programming
  • *Data-driven decision making
  • *SQL
  • *Microsoft Excel
  • *Effective presentations

MY PORTFOLIO

Case Study: How Does a Bike-Share Navigate Speedy Success?
The case study was an independent project completed upon achieving the Google Data Analytics certification.


Objective:
Determine how Cyclistic bike-share annual members and casual riders use Cyclistic bikes differently


Data sources:
The datasets consist of twelve months of real-world trip data in the city of Chicago. The raw datasets contain 5,755,694 trips. The datasets are publicly available. Due to data-privacy issues all personally identifiable information is unavailable from the datasets. The datasets were provided by Motivate International Inc. under a data license agreement. Google Maps data was also used to locate surrounding points of interest for the top ten bike-share starting stations.


Data cleaning and manipulation:

  • *Checked for duplicate records
  • *Checked started_at and ended_at fields for blanks
  • *Checked started_at and ended_at fields for non-numeric values
  • *Replaced blank values with “not_known” in start_station_name and end_station_name column

    * Home > Find & Select > Go to Special > Blanks > OK
    * Type “not_known” and press Command + Return to fill selected cells

  • *Replaced blank values with 0 for starting_station_id, ending_station_id, and starting and ending latitudes and longitudes

    * Select starting_station_id, ending_station_id, start_lat, start_lng, end_lat, and end_lng columns
    * Home > Find & Select > Go To Special > Blanks > OK
    * Type “0” and press Command + Return to fill selected cells

  • *Removed leading, trailing, extra spaces and remove non-printable characters

    * Insert Column
    * =TRIM(CLEAN(A2))
    * Copy column values, Paste Values
    * Convert to Number (where applicable)
    * Delete unclean column
    * Repeated above steps for all columns

  • *Calculated day of week

    * Insert column day_of_week
    * =WEEKDAY(C2, 1)
    * Note: days of week stored as numbers, 1 = Sunday and 7 = Saturday
    * Copy column values, Paste Values

  • *Calculated ride length in nearest minutes

    * Insert Column ride_length
    * =ROUND((D2-C2)*24*60,0) # subtract ended_at from started_at fields, convert to minutes and round to nearest minute
    * Format ride_length column to Number
    * Remove decimal values
    * Copy column values, Paste Values

  • *Focused on ride_length >= 2 minutes and ride_length < 7 days (10,080 minutes))

    * Highlight values in ride_length column
    * Home > Conditional Formatting > Highlight Cell Rules > Greater than > cell value “greater than” 10,080, highlight cell light red
    * Filter ride_length column by cell color light red
    * Delete rows with highlighted red cells
    * Repeat above steps but choose Conditional Formatting > Highlight Cell Rules > Less Than > cell value “less than” 2
    * Remove formatting: Home > Conditional Formatting > Clear Rules > Clear Rules From Entire Sheet
    * *Note: Excel had difficulty with efficiently removing the rows that contained ride_length < 2 minutes. R was used to remove these rows.

  • *Used R to generate started_at_month and started_at_hour columns to facilitate analysis process
  • *R was used to efficiently analyze the past 12 months of Cyclistic bike-share trips. The final cleaned and merged dataset consisted of 5,596,590 bike-share trips.
  • *Generated data visualizations using Tableau

Analysis Summary:

Jane is our typical casual user who enjoys exploring the city. Her ride lengths average 26.5 minutes. Similar to many of our casual users, Jane often uses electric bikes over the classic bikes. The city of Chicago has many tourist attractions. Jane enjoys the bike-share stations that are conveniently located near top tourist attractions such as Navy Pier, the Chicago Art Museum, the Magnificent Mile, and Lincoln Park Zoo. Ridership for casual users spike in July. Jane often begins her rides in the late afternoon around 5pm on the weekends.

John takes advantage of our fantastic bike sharing membership. Many of our member riders may potentially be students because our most popular member bike-share stations are near schools. While John uses both electric and classic bikes from our bike sharing service, he can be found using our classic bikes slightly more often than electric bikes. John finds the bike sharing service perfect since they are adjacent to his university. He like many of our member users is purpose-driven and less explorative than our casual riders. It is understandable his average ride length is less than half of the average casual rider.


R Notebook:

View my R notebook for the case study


Data Visualizations:

View my case study data visualizations


Key Findings:
  • *The average ride length of casual rides are twice as long as member rides.
  • *Casual riders ride 12% more on weekends compared to member riders.
  • *Casual riders use electric bikes more than classic bikes. There is a 33% difference between both bike types.
  • *Member riders use classic bikes slightly more than electric bikes. There is a 6% difference between both bike types.
  • *The top ten starting stations for casual riders are often near tourist attractions.
  • *The top ten starting stations for member riders are often near schools.
  • *Casual ridership peaks in July while member ridership peaks in August
  • *Member riders begin their rides most frequently at 8am and again at 5pm. Member riders are more active on weekdays than weekends. These patterns resemble commuter ridership.
  • *Causal riders frequently start their ride at 5pm.

Recommendations:
  • *Locate additional bikes near more universities and high schools to increase memberships.
  • *Ensure adequate bike inventory is available to meet the 8am and 5pm member ridership surges.
  • *Explore deploying more bike stations in densely populated residential areas to accommodate commuter demand.

.

CONTACT ME

Looking to hire me? Let's talk. Fill out the form and I will reply as soon as possible.