Last month, I participated in the Smart Cities Hackathon, organized by the group Women in Machine Learning and Data Science (WiMLDS). Our small group consisted of Shivani Trehan, a statistician with a background in media studies and myself and Ruthie Birger, both postdocs in computational biology and infectious diseases. We decided to look into a possible link between green spaces and respiratory health. Out of more than eight data sets publicly available and made readily availably by WiMLDS, we selected NYC tree cover, 311 complaints, and respiratory health outcomes. Although we all expected to find a correlation between green space and the hospital admission data, I expected green space to correlate with more pollen-induced asthma hospital admissions, while Ruthie and Shivani expected green space to be protective against negative respiratory health outcomes.
Our process
The main challenge of the day, like in so many academic projects, turned out to be accessing and wrangling the data. New York state hospital admission data is downloadable through SPARCS (Statewide Planning and Research Cooperative System) in very large files at https://health.data.ny.gov/en/browse?q=SPARCS, and the data can also be accessed through the cloud using Google’s command line API (application program interface). Ruthie helped me set up python and Jupyter notebook to use the Google API, while she got to work on the 311 complaint data set. Ruthie selected an impressive list of complaints that could be related to respiratory health, and also shortened her list to complaints of air quality, asbestos, general construction/plumbing, mold, rodent, smoking, and unsanitary pigeon condition.
With her interest in media, Shivani wanted to look into NYT articles to quantify and qualify articles depicting potential respiratory hazards, such as smog or pollen. The NYT query form had restricted access, so as time in the day elapsed, Shivani focused instead on getting the tree count data. We used tree cover from the TreesCount! 2015 Street Tree Census, which was conducted by volunteers and staff, and organized by NYC Parks & Recreation and partner organizations.
I managed to create a subset of the 2013 SPARCS data that included only NYC counties and health codes related to respiratory health (e.g. asthma, bronchitis, and influenza). I wanted to build a time-series model from the 311 complaints and tree cover to predict hospital admissions, but SPARCS only provides the year of patient admission. Since our model would be primarily spatial, I next found the latitude and longitude of the 56 hospital locations in NYC counties using the website www.latlong.net.
Findings
After a tour through the Carto map visualization website, we simply drug our data into the Carto interface to create some really nifty maps. First Ruthie and Shivani overlaid the Tree Count data and the geo-located 311 sanitary complaints. The video shows that 311 complaints are clustered in areas without many trees.
The video seems to support Ruthie and Shivani’s hypothesis that green spaces foster a healthy environment! On the one hand, green spaces are likely to be more desirable, and as a result of socio-economic factors, have more resources and fewer sanitary problems. On the other hand, other WiMLDS groups looked into 311 complaints and census data and suggested that wealthier NYC residents complain more, which makes our correlation between green space and lower numbers of 311 sanitary complaints more remarkable, and less contingent on socio-economic factors. We’d hoped to test our hypothesis by adding census data variables like neighborhood income to our spatial model, but we ran out of time.
Meanwhile, I overlaid the hospital admissions data on the tree cover and 311 complaints. The map was too busy with the 311 complaints, but there also seems to be a correlation between green space and respiratory hospital admissions. We weren’t sure if we could use Carto to conduct spatial correlations of the tree count, 311 complaints, and hospital admissions data, but would be interested in such correlations given more time.
All of us learned a lot over the day. We used Carto for the first time, I used Jupyter and Google’s API for the first time, and Shivani got very close to being able to access the NYT database! It was an intense day of concentration that was a reminder of what we can accomplish when working together without any distractions. Although I had to explain what a hackathon is to friends and family who may have worried that I was doing something illegal, I would love to participate in the next WiMLDS hackathon!