Allie Bergmann

St.Louis, MO 63119 · (314) 599-7060 · 11arbergmann@gmail.com

Portfolio of non-proprietary microprojects and code snippets, including academic, self-study, and hobby projects.


Academic Projects

Academic projects for Milestones I & II and Capstone for the University of Michigan's Master of Applied Data Science program.


Value Stock Screener

Project Report

An early project in the program, designed to create a data pipeline that scraped for the current S&P 500 ticker list, pulled data from Quandl (now Nasdaq) and Yahoo! Finance, and then through modular code filters for value stocks and returns a dataframe report of possible options and corresponding investing metrics.

Stock Price Prediction and Clustering

Project Report

Analysis of FTSE 100 stocks, leveraging basic supervised and unsupervised methods to identify model performance against the benchmark, and identify trends and traits of certain potential trading categories.

Survey and Backtesting of Advanced Stock Price Forecasting Models

Project Report

An expansion on concepts of the previous project, developing an end-to-end machine learning pipeline for forecasting stock prices on a daily scale. The dataset contained the S&P 500 tickers (reduced), and a survey of linear and non-linear models. The models were trained, tested, and validated on an ensemble of fundamental, technical, and decomposed features, and run against a backtesting framework to detect actual alpha generation in practical trading applications as a form of cross-validation.


Disclaimer: Any financial work presented is for educational and research purposes only and is not intended to be used as an official investment recommendation. All investment/financial opinions expressed in these documents are from the personal research and experience of the project owner(s) and are intended as educational/informational material. Although best efforts are made to ensure that all information is accurate and up-to-date, occasionally unintended errors or misprints may occur in the data. There are no guarantees of financial returns based on any of these analyses. It is important to do your own analysis before making any investment based on your own personal circumstances. All financial decisions should be made with the help of a qualified financial professional.


Proof of Concept Work

Work done to practice and show skillsets by building applications - data may be mock data and not necessarily to be taken as real.


Pricing and Revenue Operations Dashboard

RevOps Dashboard

A quick mock up example of a dashboard for revenue operations and pricing. Leverages Dash and Plotly libraries.


Data Manipulation and Proccessing

Code samples and exercises in Jupyter Notebooks exploring the utilizations of NLP, Time Series analysis, and data stream sampling.


N-Gram Language Models (Markov Models)

Jupyter Notebook

A collection of exercises and code exploring n-grams, Markov chains (including a Shakespearean Sonnet generator), and Markov Models for part-of-speech (POS) prediction tagging.

NLTK

Time Series Analysis (Basic)

Jupyter Notebook

A collection of exercises and code exploring time series analysis through COVID-19 data. These exercises explore seasonal decomposition, trend analysis, weighted and exponential moving averages, similarity measures (Euclidian and Cosine), Dynamic Time Warping (DTW), Stationarity testing, AC, PAC, Single Time Series Forecasting (ARIMA), Multiple Time Series Forecasting (VAR), and Granger Causality.

NumPy, Pandas, Statsmodels, Sklearn, matplotlib

Data Stream Sampling

Jupyter Notebook

Leveraging a Twitter dataset to explore data sampling techniques including Random and Reservoir Sampling, counting methods including Bloom filters and Lossy counters.

NumPy, Pandas, collections, matplotlib

Big Data Processing

Code samples and exercises in Jupyter Notebooks exploring big data processing through MRJob and PySpark - leveraging RDDs, UDFs, and SQL.


MRJob

Jupyter Notebook

Using Map Reduce to mine information from large Project Gutenberg text files. An intro to split-apply-combine techniques before moving into faster Spark-based applications.

MRJob, re, itertools, syllapy

Spark + NLTK (RDD Focus)

Jupyter Notebook

Exercises to expand familiarity with PySpark, work on basics of Spark RDD API, and practice applications of NLTK. Basic introduction to NLP by parsing a large text corpus for part-of-speech tagging and converts a Python-based script to a PySpark-based approach.

NLTK, re, PySpark

Spark + UDF

Jupyter Notebook

Expand familiarity with PySpark and work on basics of User-Defined Functions on a 15gb Yelp Academic dataset. Aim of exercises are to mine dataset for sentiment analysis, traits of useful/positive reviews, etc.

PySpark

Spark + SQL

Jupyter Notebook

Expand familiarity with PySpark and work on basics of User-Defined Functions on a 15gb Yelp Academic dataset. Aim of analysis is to mine dataset for various inclusion and usefulness statistics within the reviews.

PySpark

Visualizations

Code samples and exercises in Jupyter Notebooks primarily using the Altair visualization library. Due to known rendering issues within GitHub and in html format, PDFs are also available, but lack the ideal formatting and interactivity that is provided in the Jupyter notebooks.


Crash Reports

Jupyter Notebook
PDF

An exploration of crash reports in Chicago along specific thoroughfares.

NumPy, Pandas, Altair

Bechdel Test

Jupyter Notebook
PDF

A look at 50 movies against new ways of measuring and visualizing gender imabalance.

NumPy, Pandas, Altair

Star Wars

Jupyter Notebook
PDF

A look at Star Wars fan preferences of movies and characters, and (obviously) who shot first (it's Han Solo and you know it).

NumPy, Pandas, Altair, math

Mayweather-McGregor

Jupyter Notebook
PDF

The Mayweather-McGregor fight as told by Twitter/Emojis.

NumPy, Pandas, Altair

Bob Ross

Jupyter Notebook
PDF

Bob Ross painting analysis, specifically feature inclusion in each of his paintings. (Note: Interactive notebook)

NumPy, Pandas, Altair, Sklearn, PIL

Marvel vs DC

Jupyter Notebook
PDF

Comparison of Marvel vs DC characters, specifically analytics of new character introduction. (Note: Interactive notebook)

NumPy, Pandas, Altair

The Simpsons

Jupyter Notebook
PDF

A visual look at quotable convsations in The Simpsons, leveraging network analysis in Altair. (Note: Interactive notebook)

NumPy, Pandas, Altair, matplotlib, SciPy, sklearn, json, networkx, nx_altair, squarify

Plotly Demo Walkthrough

Jupyter Notebook
PDF

A small project writeup that required creating a tutorial for a visualization library (Plotly). (Note: Interactive notebook)

NumPy, Pandas, matplotlib, pandas_datareader, plotly