Scraping Congressional Financial Disclosure
In a perfect world, it would be easy to where our congresspeople put their money. That way it would be easy to see whether an elected official was charged with regulating an industry where their own bank account could be affected.
But we do not live in a perfect world! Representatives and senators, as well as anyone running for these offices, submit annual Financial Disclosure Forms, as well as supplimentary disclosures for large trades and a few other events. And while the forms are publicly available, the data is tied up in unindexed PDF’s (for the House) and unindexed HTML (for the Senate). The format of these forms changed from year to year, and many of the forms were filled out by hand. (In 2019!)
Sludge is a non-partisan site dedicated to investigating money in politics. They wanted the Financial Disclosure information, but they didn’t want to take the time to go over every form manually. So they asked me to scrape, clean, and parse the forms into something manageable. BeautifulSoup, Selenium, PDFPlumber and PDFQuery made it possible.
Selected Articles
My work allowed reporters to run and bolster dozens of stories about congressional conflicts of interest. Links to a few are below.
-
Sludge, 1/20/20
The Members of Congress Who Profit From War -
Sludge, 1/03/20
Members of Congress Own Up to $93 Million in Fossil Fuel Stocks -
Sludge, 10/22/19
Reps Who Will Question Zuckerberg Own Stock In Facebook -
The Guardian, 9/19/19
Revealed: how US senators invest in firms they are supposed to regulate -
Sludge, 6/20/19
Leader of Centrist Climate Caucus Has Millions Invested in Oil and Gas Companies -
Sludge, 6/4/19
Presidential Candidate Who Attacked Medicare for All is Invested in Health Care Companies -
Sludge, 4/10/19
Reps Questioning Megabank CEOs Own Stock in Their Companies