Scraping Congressional Financial Disclosure

In a perfect world, it would be easy to where our congresspeople put their money. That way it would be easy to see whether an elected official was charged with regulating an industry where their own bank account could be affected.

But we do not live in a perfect world! Representatives and senators, as well as anyone running for these offices, submit annual Financial Disclosure Forms, as well as supplimentary disclosures for large trades and a few other events. And while the forms are publicly available, the data is tied up in unindexed PDF’s (for the House) and unindexed HTML (for the Senate). The format of these forms changed from year to year, and many of the forms were filled out by hand. (In 2019!)

Sludge is a non-partisan site dedicated to investigating money in politics. They wanted the Financial Disclosure information, but they didn’t want to take the time to go over every form manually. So they asked me to scrape, clean, and parse the forms into something manageable. BeautifulSoup, Selenium, PDFPlumber and PDFQuery made it possible.

Selected Articles

My work allowed reporters to run and bolster dozens of stories about congressional conflicts of interest. Links to a few are below.