Analysis of 311 Service Requests using “tidyverse” in R
Scenario: The mayor is having a community meeting in Roslindale tonight. Using the last 6 months of call center data from the 311 dataset create a report that 1) details what constituents in this neighborhood call about, 2) helps the mayor understand how Roslindale differs from the rest of the city, and 3) what if any major events or recent trends have been having a measurable impact on city operations in the neighborhood.
Rmd solution file can be dowloaded here.
library(tidyverse)
df <- read_csv("311.csv")
# filter the dataset to contain only the last 6 months of call center data
df <- df %>%
  mutate(date = as.Date(open_dt)) %>%
  filter(date >= "2017-03-12")1. What constituents in Roslindale call about?
The bar plot below shows the reasons that constituents in Roslindale call about. We can see that the main issues in Roslindale are related to sanitation. Building and Code Enforcement are also the things that people often call about, at least in the last 6 months.
# number of constituent calls by reason
constituent_calls <- df %>%
  filter(neighborhood == "Roslindale", Source == "Constituent Call") %>%
  group_by(REASON) %>%
  summarise(Number_of_calls = n()) %>%
  arrange(desc(Number_of_calls))
# visualise constituent calls by reason
constituent_calls %>%
  ggplot(aes(x = reorder(REASON, Number_of_calls), y = Number_of_calls)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    xlab("") +
    ylab("Number of calls") +
    ggtitle("Count of 311 Constituent Calls in Roslindale by Reason")Below there is a table that summarises what constituents call about in more detail. That is, the table below segregates the reasons into subcategories that better describe the issues.
# percentage of constituent calls by type
calls_type <- df %>%
  filter(neighborhood == "Roslindale", Source == "Constituent Call") %>%
  group_by(TYPE) %>%
  summarise(Number_of_calls = n()) %>%
  mutate(Pct_of_all_calls = round(100 * Number_of_calls/sum(Number_of_calls), 2)) %>%
  select(TYPE, Pct_of_all_calls) %>%
  arrange(desc(Pct_of_all_calls))
names(calls_type) <- c("Type of issue", "Percentage of all calls")
calls_type## # A tibble: 115 x 2
##    `Type of issue`                             `Percentage of all calls`
##    <chr>                                                           <dbl>
##  1 Schedule a Bulk Item Pickup                                     16.1 
##  2 Missed Trash/Recycling/Yard Waste/Bulk Item                     12.5 
##  3 Animal Generic Request                                           3.88
##  4 Tree Maintenance Requests                                        3.68
##  5 Request for Recycling Cart                                       3.63
##  6 Building Inspection Request                                      3.53
##  7 Traffic Signal Inspection                                        3.33
##  8 Abandoned Vehicles                                               3.08
##  9 Requests for Street Cleaning                                     2.39
## 10 Parking Enforcement                                              2.29
## # ... with 105 more rowsWe can see that the majority of constituent calls in Roslindale are to schedule a bulk item pickup or are related to missed trash / recycling / yard waste and again, bulk items.
2. How Roslindale differs from the rest of the city?
We can see from the graph below that the main difference between Roslindale and the rest of the city is that compared to other neighborhoods, a higher proportion of cases in Roslindale are related to Sanitation. However, compared to the rest of the city, less cases in Roslindale are created about enforcement, abandoned vehicles and street cleaning.
# finding the total number of cases in Roslindale and the rest of the city
n_cases <- df %>%
  mutate(Roslindale = ifelse(neighborhood == "Roslindale", 1, 0)) %>%
  group_by(Roslindale) %>%
  summarise(n())
n_Roslindale <- as.numeric(n_cases[2, 2])
n_other <- as.numeric(n_cases[1, 2])
# select only first 16 reasons to visualise differences
differences_reason <- df %>%
  mutate(Area = ifelse(neighborhood == "Roslindale", "Roslindale", "Rest of city")) %>%
  filter(!is.na(Area)) %>%
  group_by(REASON, Area) %>%
  summarise(number_of_cases = n()) %>%
  mutate(pct_of_cases = ifelse(Area == "Roslindale",
                               100 * number_of_cases/n_Roslindale,
                               100 * number_of_cases/n_other)) %>%
  select(REASON, Area, pct_of_cases) %>%
  spread(key = Area, value = pct_of_cases) %>%
  arrange(desc(Roslindale)) %>%
  head(16)
# visualise the differences
differences_reason %>%
  gather(key = "Area", value = "pct_of_cases", -REASON) %>%
  ggplot(aes(x = reorder(REASON, desc(pct_of_cases)), y = pct_of_cases, fill = Area)) +
    geom_bar(stat = "identity", position = "dodge") +
    theme(axis.text.x = element_text(angle = 65, hjust = 1)) +
    xlab("") +
    ylab("Percentage of Cases Created") +
    ggtitle("Percentage of 311 Cases Created by Reason")Looking at the source of created cases, we see that in Roslindale, compared to the rest of the city, a larger percent of all cases are created by constituent calls and less cases are created by the Citizens Connect App.
differences_source <- df %>%
  mutate(Area = ifelse(neighborhood == "Roslindale", "Roslindale", "Rest of city")) %>%
  filter(!is.na(Area)) %>%
  group_by(Source, Area) %>%
  summarise(number_of_cases = n()) %>%
  mutate(pct_of_cases = ifelse(Area == "Roslindale",
                               100 * number_of_cases/n_Roslindale,
                               100 * number_of_cases/n_other)) %>%
  select(Source, Area, pct_of_cases)
differences_source %>%
  ggplot(aes(x = reorder(Source, desc(pct_of_cases)), y = pct_of_cases, fill = Area)) +
    geom_bar(stat = "identity", position = "dodge") +
    theme(axis.text.x = element_text(angle = 65, hjust = 1)) +
    xlab("Source") +
    ylab("Percentage of Cases Created") +
    ggtitle("Percentage of 311 Cases Created by Source")3. Major events that have been having a measurable impact on city operations in Roslindale
The graph below shows that there was a major event in March, when people in Roslindale created more than 75 cases. There are also a few dates on which number of cases reported in Roslindale exceeds 50.
# number of cases in Roslindale by date
cases_per_day <- df %>%
  filter(neighborhood == "Roslindale") %>%
  group_by(date) %>%
  summarise(count = n())
# visualise cases per day
cases_per_day %>%
  ggplot(aes(x = date, y = count)) +
    geom_line() +
    xlab("Date") +
    ylab("Number of Cases Created") +
    ggtitle("Count of 311 Cases Created in Roslindale by date")Looking at top 2 types of cases on the days when the total numer of cases is 50 or more, we get the following results:
# dates with 50 or more cases created
events_dates <- cases_per_day %>%
  filter(count >= 50)
# top 2 types of cases for each of those dates
top_cases <- df %>%
  filter(neighborhood == "Roslindale", date %in% events_dates$date) %>%
  group_by(date, TYPE) %>%
  summarise(count = n()) %>%
  arrange(date, desc(count)) %>%
  top_n(n = 2, wt = count)
names(top_cases) <- c("Date", "Type of issue", "Number of cases")
top_cases## # A tibble: 17 x 3
## # Groups:   Date [8]
##    Date       `Type of issue`                             `Number of cases`
##    <date>     <chr>                                                   <int>
##  1 2017-03-15 Unshoveled Sidewalk                                        34
##  2 2017-03-15 Request for Snow Plowing                                   18
##  3 2017-03-16 Unshoveled Sidewalk                                        51
##  4 2017-03-16 Missed Trash/Recycling/Yard Waste/Bulk Item                 7
##  5 2017-03-17 Unshoveled Sidewalk                                        23
##  6 2017-03-17 Misc. Snow Complaint                                       10
##  7 2017-05-11 Missed Trash/Recycling/Yard Waste/Bulk Item                14
##  8 2017-05-11 Tree Maintenance Requests                                   7
##  9 2017-05-17 Tree Maintenance Requests                                  21
## 10 2017-05-17 Parking Enforcement                                         7
## 11 2017-05-23 Tree Maintenance Requests                                  21
## 12 2017-05-23 Traffic Signal Inspection                                   9
## 13 2017-06-14 Parking Enforcement                                         8
## 14 2017-06-14 Missed Trash/Recycling/Yard Waste/Bulk Item                 5
## 15 2017-06-14 New Tree Requests                                           5
## 16 2017-08-10 Requests for Street Cleaning                               25
## 17 2017-08-10 Missed Trash/Recycling/Yard Waste/Bulk Item                 8We see that the major event that had the biggest impact on city operations in Roslindale was the snowstorm that happened on March 14, 2017. This is the reason why on March 15 - 17, many people reported unshoveled sidewalks and requested snow plowing.
Looking at other dates, we can see that in the month of May, there was an increase in tree maintenance requests (May 11, May 17 and May 23).
On June 14 there were more than 50 cases created in Roslindale, however there does not seem to be any trend.
On August 10, many people requested street cleaning which indicates that there might have been some kind of event that took place in Roslindale.
