Full data pipeline and Visualizations
I created this Dataset by creating various automated scripts in python, using Selenium, to scrape Espn and various other sites for all the data/stats for each player/defense in the NFL each week. This data was loaded into pandas dataframes and tempararily stored as .csv files.
Rigorous data cleaning and preparation was performed to drop the players who no longer play and have incomplete data by dropping rows that had certain values as 0 or NaN. Feature engineering was used to split singles feature columns, that held multiple values origanally in the same html element.
Data was stored in a MySQL database on my local machine. Inicially I used PyMySql to connect to my DB and load all the data into is respective tables such as, players, teams, positions and game-stats. I also did the same using SSIS to create a package that would run the python scripts and load the data into the database. I then created a View that combined all the tables into one table for easy access and exporing.
With this loaded dataset I performed Exploritory data analysis using various tools such as pythons plotly/seaborn. I utilized visualization and statistical methods to understand data patterns, relationships, and anomalies, aiming to derive initial insights for subsequent analysis. Top performing players and defenses were found and stat trends shown.
After exploring the data I found found various insights usefule to fantasy football enjoyers and created various dashboards using Tableau and Power BI where these insights could be seen and used. These dashboards provide uses the abilty to make informed descisions on which players were most likey to sucess on a case by case senario. (Linked to these can be found on the projects secion of the home page)
Various methods were used to shared the outcomes of this project. I developed various apis to share the model predictions made by the ML model. Another API using dash way purely visual to provid users a easy way to explore and filter the data. I also posted my tableau dashboards on tableau public for viewing.
Most players information was combined to train the same model expect for the defensive team beause it has stats that are so different then individual players. I also needed a way to show the model that certian player had 0 points because they were injured so I added a binary column to show if they were injured or not. For my model I used xgBoost and used Mean Average Precision as the loss metric. After only 5 weeks of training data this season, I was able to acheive a MAP score of 4.56, which is great considering the volatility of football stats and games.