Andrew Friedman

afriedman412 [at] gmail [dot] com

HomeCodeDataAbout

Coding Experience:

Police Sourcing in American Crime Reportage
Center for Just Journalism / NYU Wagner Graduate School
Data Scientist
12.22.2022 - 5.31.2023

A trio of Wagner graduate students investigated the extent to which 10 American newspapers relied on police sources when reporting on crime, and how that affected coverage of both crime and police. I handled data acquisition and processing of 100,000 articles, of which the students analyzed a representative sample of around 300. I did a similar study which quantified the rate of police sourcing over a much larger sample.

For my study, I developed a Python package called Sayswho which identifies quotes and identifies their speakers. It combines a quote detection algorithm with a coreferencing model to cluster words referring to the same source, and attribute quotes to the clusters. This allows programmatic resolution of vague speakers (ie: “he said”, “the lawyer added” etc.)

Tools used:

SpaCy, BERTopic, CoreNLP, Pandas, SQL, Jinja


Chatdesk
Data Scientist
11.2018 - 8.2022

Chatdesk is a Series A company backed by leading Silicon Valley investors like Menlo Ventures, Susa Ventures and Slow Ventures in the customer service space. I was responsible for cleaning and standardization of over 1 million weekly incoming customer messages in 10+ languages. I developed a model for identifying six different types of messages at 99.5% accuracy, and used named entity recognition to increase the capacity and flexibilty of the processing pipeline.

Tools used:

SpaCy, PyTorch, Google Cloud, Twilio, Pandas, FastAPI


Google/Medill Data-Driven Reporting Project
Data Lead
6.2022 - present

We obtained 30 years of localized crime data from the Baltimore Police department, covering 50 categories. Our goal is to tell the story of the city through the type and location of crime. We are interested in connecting patterns in the data to concurrent events, such as elections, the passing of legislation, or the closing of businesses. Project is ongoing but future plans include integration with outside data sources, statistical modeling and the development of an interactive front-end.


Rap Caviar Gender Balance
ongoing

Data collection and visualization of Spotify’s influential Rap Caviar playlist, tracking how the gender balance changes over time.

Tools used:

SQL, AWS, ElasticBeanstalk, Pandas


Brooklyn Eviction Defense
2020 - present

An organization dedicated to keeping Brooklyn tenants from losing their apartments, I developed an app to scrape housing court data to help tenants fight evictions which saved lawyers hours of work. I also created a Twilio-based hotline to provide tenants information and resources, and to connect the organization for further aid.

Tools used:

Twilio, Selenium, SQL


Congressional Financial Disclosure Scraper
Sludge
TKTKTK

For money in politics site Sludge, I developed and maintain an app that queries both the House and Senate on the hour and downloads any new financial disclosure forms.

Tools used:

Selenium, PDF TKTK, BeautifulSoup, Heroku


Data Management and Integration
Tucker and Bloom
TKTK

For Nashville bag and leather goods company Tucker and Bloom, I am developing a sales and inventory tracking pipeline to integrate with Shopify.


Business Metrics Dashboard
New York Adventure Club
6.2018

Developed and deployed dashboard to track ticket sales, event performance, and to predict success of future events.