130,000 Reasons Why Data Science Can Clean Up San Francisco

Cleaning up litter in San Francisco, California.

Guest post by Aleksey Bilogur, Data Scientist

Every urbanite knows how important clean streets are to the attractiveness and viability of a neighborhood. Clean streets mean walkable streets, walkable streets mean busy streets, and busy streets mean prosperous neighborhoods.

Nobody knows this better than the founders of Rubbish, Emin Israfil and Elena Guberman. Armed with trash bags and snazzy-looking smart trash grabbers, aka “ rubbish beams “, the Rubbishers have picked up over 130,000 individual pieces of litter over the course of almost a year of regular street cleanings in San Francisco — collecting data all the while on what, where, and when.

This blog post is a first-ever analysis of this unique dataset. We will slice and dice the street pickups by location, type, and time, learning about how our habits as pedestrians and residents of our communities affect our urban environment in the process.

Setting the scene

Rubbish runs are conducted on four blocks of Polk Street, between Filbert and Broadway, in the Russian Hill neighborhood of San Francisco. Each run aims to clean up every piece of litter that was on the ground, including gutters and drains.

Polk Street starts out as mostly residential at its northern end, but becomes rapidly commercial the further south you go. Commercial areas are more trash-laden than residential ones, due to increased foot traffic, and Polk Street is no exception. If plot the average number of pieces of litter dropped onto the street per block per day, we see just how strong this effect is:

Picture of Litter Per Day Per Street
Each square represents a city block. Higher numbers mean more litter.

In just a couple of blocks litter pickups jumps from just 16 items per day to an incredible 90 items per day. Put another way, the cleanest block on Polk Street is littered on 5,800 times per year; the dirtiest, 32,000 times per year.

The leftmost southernmost block (between Broadway and Vallejo) is more than twice as dirty as the block directly across the street, even though both blocks are 100% commercial. That’s because street litter is correlated with business location and types.

Businesses that have more to go items and foot traffic seem to contribute to more litter overall. The left side of Polk Street has all of the street’s pizza shops and bars, plus a nightclub, whilst the right side of Polk Street has pharmacies and hardware stores and a bookstore. And while the two sides share the neighborhood’s restaurants and coffee shops, the difference between a sidewalk abutting Rouge Nightclub and one abutting Walgreen’s is easy to see:

Graph showing how the amount of trash varies by street.
There is more litter on the left side of Polk Street (shown on the bottom here) than on the right (shown on top).

In this visualization the left side of Polk Street is on the bottom, and the right side, on the top. Filbert Avenue, the northern edge of the pilot zone, is on the left end; Broadway, the southern edge, is on the right.

Another way of looking at this data is on a business-to-business basis. Doing so highlights how much litter volumes changes based on where you are on the street:

A box and whiskers plot of litter
Most store fronts on Polk street averaged 19-48 pieces of litter foot per year.

“Mean litter” is 36 per foot frontage per year. A typical storefront is 20 to 25 feet wide, so a Polk Street business of average size in an average location will see around 800 litter pickups per year — that’s roughly three pieces of litter per day. Particularly troublesome spots can be three times as dirty, and particularly clean ones, three times as clean.

Now that we understand where litter ends up, a good follow-up question is what types of litter end up there.

Rubbish separates litter into five categories. The majority of street litter consists of two types: tobacco, primarily cigarette butts, and paper, e.g. coffee cups and receipts. Plastic items like chips bags are also common. A small percentage of rubbish is food or glass. Finally there’s other for things that don’t fit anywhere else.

Here’s what we see if we plot litter pickups by type over time:

A line chart showing how trash composition changes over time
Tobacco accounted for the most litter and glass the least.

Litter pick-ups on Polk Street is 45% tobacco, 32% paper, 14% plastic, 1% food, 1% glass, and 7% other. This composition is very stable over time, but does deviate from its usual pattern from time to time. The most obvious exception occurred on April 22 2019, and may have been due to some kind of neighborhood event (though we don’t know for sure).

(You may be wondering about what the team does about when it encounters certain “special cases”. Rubbish reports discarded needles and 💩 to SF311, using a reporting tool built directly into the app. The team has also recovered and returned a stolen thing or two to their original owners — but that’s a story for another time!)

What about litter over time?

The rubbishers love to talk about “the good neighbor effect”. Over the course of their time cleaning up Polk Street the Rubbish team has been approached and questioned by many of the local businesses and straphangers in the neighborhood (and, in typical San Francisco fashion, netted a few venture capitalists’ business cards in the process). Their visibility and approachability on the street have made locals more conscious about how their actions and business policies affect the neighborhood they serve, work, and live on.

Another equally important effect is the “clean streets” effect. The Rubbish team talks extensively about the fact that the cleaner the street, the less likely it is to be littered on. As pedestrians, we seem to psychologically reserve our littering tendencies for those streets which are already dirty. If there’s already litter on the street, will anyone even notice one more candy wrapper or receipt stub? The cleaner the street, the more slowly it accumulates new litter.

The volume of litter picked up on Polk Street has descended precipitously over time. And while it’s impossible to know for sure what caused it, it’s easy to see how the constant street presence and cleanup work done by the Rubbish team has been a big reason why:

A line graph showing the volune of litter picked up over the course of a year, major spike on Halloween and major dips during the Camp Fires and a heat wave
Halloween represented the highest littered day of the year. While events that kept people indoors, like heatwaves and campfires reduces the amount of litter collected.

“Peak litter” occurred in late 2018, after which there was a sharp decline going into the new year. The amount of litter dropped on Polk Street has been declining slowly ever since. A typical day in November 2018 might see 1000 pieces of litter dropped onto the street; by February 2019, just a few months later, 400 was more typical.

Many of the peaks in this dataset correspond to holidays and/or events that bring more people out onto the street — Halloween, for example, show up prominently. Not all of the peaks are easy to explain, however; we don’t know for sure what caused the spikes in late August and April, for example.

Down times are similarly usually-but-not-always explainable. Polk Street was at its cleanest immediately following a community-organized street cleanup in late May. The dip in late November happened around the time of the Camp Fires, which reduced air quality in San Francisco to dangerously low levels, basically freezing street activity in the process. And the dip in early June corresponded with the SF heat wave, a week that saw temperatures in the Bay Area hit a stifling 100 degrees.


It’s exciting to see how much we can learn about the pattern of street litter by looking at pickup data. And although we have access to almost a year’s worth of data, most of these effects would show up with even just a few day’s worth of it. Perhaps data like this can one day become part of a community organizer’s toolkit.

Rubbish is working with cities and communities to create a smart approach to litter, using data to put cigarette disposals and trash cans where they will have the biggest impact. Understanding litter trends and how they vary from street to street can tell us a lot about our communities and can be a powerful tool in helping identify the source of litter and clean more effectively. We look forward to seeing how technology can help our cities become healthier, safer and more sustainable.

Written by
Aleksey Bilogur