top of page
Example Graphic at Bottom of Page
Project Outline

• Utilizing 20+ years of historical NBA shot data to create a predictive statistic for the points expected for any shot taken in the future.
• Wrote a program within Jupyter Notebook that uses Machine Learning (Random Forrest and Logistic Regression) to analyze factors (such as shot type, distance from basket, amount of defenders nearby, time on the shot clock, game state etc.) that have influenced shot outcome and ultimately create a statistic that helps us judge the "quality" and therefore preditced outcome of a shot.
• Used Python and SQL to scrape, clean, and manipulate the data. Implemented One-Hot and Target Encoding to prepare the data for the Regression Algorithms.
• Working towards further using Machine Learning to continue adding different factors that affect the outcome of a shot as well as well as use Feature Selection to improve the model.

Pulling data

Utilized the official NBA API's "ShotChartDetail" dataset.
Had to pull each seasons data one by one due to the nature of the API

Creating Model

Utilized a logistic regression from sklearn library.
Implemented one-hot and target encoding to prepare data

Code to pull data

[as a txt file]

contd.. from previous file

Code to make model

[as a txt file]

Key Takeaways:
As tested within the code, when the code is tested long term (on historical data), the model is 99.7% accurate. I.e. the xPts vs Pts has a difference of only 0.03% when the model is tested on historical NBA shot data.

Playing with the model to test outputs [code given as txt file]

Example Ouput Graphic:
Points Added Per game.png
bottom of page