Piercing the Sky Responsibly: Predicting NYC Skyscrapers’ Carbon Compliance

Hackathon Win

🏆 This project won the 2023 Columbia University Data Science Hackathon for its innovative approach to predicting carbon compliance in NYC skyscrapers, showcasing the potential of data science to impact real-world environmental policies.

Project Overview

This data science project analyzes the likelihood of New York City skyscrapers meeting 2030 carbon compliance under Local Law 97. This law, passed in 2019, mandates carbon emission reductions across various building types by 2030 to curb NYC's carbon footprint. Buildings in Manhattan, which produce the highest GHG emissions intensity, were the focus of this analysis.

Project Workflow Animation

Data Cleaning

Data Preparation: Handled missing data with KNN imputation and selected key variables, focusing on high-rise buildings’ GHG emissions.

EDA

Exploratory Data Analysis: Initial analysis revealed Manhattan as the borough with the highest GHG emissions intensity, warranting targeted attention.

Modeling

Modeling: Using regression models (KNN, OLS, Ridge, Lasso), we identified key emission factors, including floor area and building height, achieving 68% accuracy.

Prediction

Prediction: Logistic regression revealed that few buildings meet the 2030 goal, indicating a need for significant policy adjustments or penalties.

Data Preparation & Analysis

The project began with data cleaning, replacing missing values and selecting relevant variables through manual scraping. Manhattan, known for its high-rise density, was identified as the primary borough of interest due to its substantial GHG emissions.

Regression Models and Insights

We used multiple regression models (KNN, OLS, Ridge, and Lasso) to predict GHG emissions based on variables such as gross floor area, building height, and bedroom density. Our highest accuracy of 68% was achieved by scaling and tuning the Ridge model. Key findings indicated that multifamily housing and large commercial buildings significantly contribute to GHG emissions.

Predictive Modeling: Logistic Regression for Compliance

Logistic regression was utilized to classify buildings by their GHG emissions relative to national medians. This model included variables like floor area and energy use intensity, achieving 62% accuracy. Findings revealed only a small percentage of current skyscrapers would meet 2030 compliance, emphasizing the urgency for increased carbon regulation and potential penalties.

Conclusion and Future Implications

Our analysis underscores the challenge of achieving carbon compliance for NYC skyscrapers by 2030. Current energy standards may not suffice to meet these goals without drastic interventions, including penalties or incentives. Future work could incorporate additional factors like energy source type and advanced simulation techniques to refine predictions.