I am a data analyst living in Fort Myers, Florida. I recently completed the rigorous hands-on Google Data Analytics Certification. I earned a Bachelor of Science degree in Computer Science from Saint Xavier University.
My technical skills are:
Objective:
Determine how Cyclistic bike-share annual members and casual riders use Cyclistic bikes differently
Data sources:
The datasets consist of twelve months of real-world trip data in the city of Chicago. The raw datasets contain 5,755,694 trips. The datasets are publicly available. Due to data-privacy issues all personally identifiable information is unavailable from the datasets. The datasets were provided by Motivate International Inc. under a data license agreement. Google Maps data was also used to locate surrounding points of interest for the top ten bike-share starting stations.
Data cleaning and manipulation:
* Home > Find & Select > Go to Special > Blanks > OK
* Type “not_known” and press Command + Return to fill selected cells
* Select starting_station_id, ending_station_id, start_lat, start_lng, end_lat, and end_lng columns
* Home > Find & Select > Go To Special > Blanks > OK
* Type “0” and press Command + Return to fill selected cells
* Insert Column
* =TRIM(CLEAN(A2))
* Copy column values, Paste Values
* Convert to Number (where applicable)
* Delete unclean column
* Repeated above steps for all columns
* Insert column day_of_week
* =WEEKDAY(C2, 1)
* Note: days of week stored as numbers, 1 = Sunday and 7 = Saturday
* Copy column values, Paste Values
* Insert Column ride_length
* =ROUND((D2-C2)*24*60,0) # subtract ended_at from started_at fields, convert to minutes and round to nearest minute
* Format ride_length column to Number
* Remove decimal values
* Copy column values, Paste Values
* Highlight values in ride_length column
* Home > Conditional Formatting > Highlight Cell Rules > Greater than > cell value “greater than” 10,080, highlight cell light red
* Filter ride_length column by cell color light red
* Delete rows with highlighted red cells
* Repeat above steps but choose Conditional Formatting > Highlight Cell Rules > Less Than > cell value “less than” 2
* Remove formatting: Home > Conditional Formatting > Clear Rules > Clear Rules From Entire Sheet
* *Note: Excel had difficulty with efficiently removing the rows that contained ride_length < 2 minutes. R was used to remove these rows.
Jane is our typical casual user who enjoys exploring the city. Her ride lengths average 26.5 minutes. Similar to many of our casual users, Jane often uses electric bikes over the classic bikes. The city of Chicago has many tourist attractions. Jane enjoys the bike-share stations that are conveniently located near top tourist attractions such as Navy Pier, the Chicago Art Museum, the Magnificent Mile, and Lincoln Park Zoo. Ridership for casual users spike in July. Jane often begins her rides in the late afternoon around 5pm on the weekends.
John takes advantage of our fantastic bike sharing membership. Many of our member riders may potentially be students because our most popular member bike-share stations are near schools. While John uses both electric and classic bikes from our bike sharing service, he can be found using our classic bikes slightly more often than electric bikes. John finds the bike sharing service perfect since they are adjacent to his university. He like many of our member users is purpose-driven and less explorative than our casual riders. It is understandable his average ride length is less than half of the average casual rider.
View my R notebook for the case study
View my case study data visualizations
.
Looking to hire me? Let's talk. Fill out the form and I will reply as soon as possible.