Jessica Holliday, NTEN’s Operations Director, learned a lot at the Do Good Data Conference in Chicago last week and chose to share some highlights with us.
Be Generous with Data—Water for People
Ned Breslin from Water for People preached to the choir of the importance of using data to drive successful programs. He inspired audible gasps when he quoted a shocking statistic from the Bridgespan Group—only 6% of nonprofits use their data to drive improvements in their work. Data, he said, is being used more for PR than for driving mission outcomes. It is essential that nonprofits get away from using data purely for marketing and instead focus on using data for monitoring. He attributes Water for People’s success to their relentless focus on (and measurement of) outcomes. Here are a few more highlights from his talk that stuck with me:
- Water for People is generous with their data: they share their findings around the sector
- They created a mobile application (Flow – www.akvo.org) for collecting data in the field, which they were also generous with—it is now free and open source
- They closed all of their offices for a week in order to give staff time to step back and look at their data
- Because of their innovative approach and emphasis on long-term monitoring, Water for People now gets the majority of their funding not from traditional funders in the water and sanitation field, but from funders looking to disrupt and innovate
Practical Applications of Data Mining in Philanthropy—Foundation Center
Each year, Foundation Center staff process thousands of IRS filings from private foundations. The 20-person department pulls the information from scanned PDFs of the filings, enter the information into their database, and classify the information in the Foundation Center taxonomy. Foundation Center has been working on automating much of this work.
One of my first jobs in tech 1,000 years ago was as “Text Production Lead” on a music encyclopedia CD-ROM (Google it, kids). We had to use optical character recognition scans on five decades of Billboard Charts, get them into a database, and categorize all the entries. It took us four months of non-stop proofreading and data entry! So I am intimately acquainted with the problem that Foundation Center programmers are trying to solve with their machine learning approach.
Personally, I have been futzing around with learning R (a statistical/data manipulation language) and have passing knowledge of a lot of the concepts that these fellows addressed—machine learning, natural language processing, etc. But it did feel like I was moved from first year high school French straight to an immersion class. That’s OK! I have a lot of new tools with great names I’m excited to check out (Orange! Octave! Flask!). And I think I’m going to ditch R in favor of Python (email me if you want to try to talk me out of that decision; I’m very interested in what folks are using in the real world instead of Coursera).
A few other takeaways:
- Foundation Center’s new tool is amazeballs. It has great visualizations and provides a truly fantastic resource. I really recommend you check it out.
- Foundation Center uses an agile process in their data mining. Jake Garcia says that this allows them to move quickly and move on once they get to “good enough,” saving them from getting bogged down trying to get to perfect.
Data Use in Collective Impact – Strive Partnership
Strive Partnership spearheaded a cooperative program to examine collective impact on student success in the Cincinnati School District. They created a data warehouse that tracked not only student progress in school, but progress from cradle to career. They are attempting to get a big picture view of the impact of interventions children receive not only in school but through all programs: afterschool, college preparation, etc. Here were some highlighted findings:
- Success depended on being able to scale back. There were a lot of data points they could measure, but by selecting and focusing on six key levers and limiting the sources of data (for now), they have a usable program to which they can continue to add functionality and improvements.
- It does seem as much a collaboration problem as a data problem to get so many stakeholders to share their data. However there is immediate value to the partners in the form of a dashboard and metrics for applying for grants, etc.
- Also, Strive Partnership measured the partners! They put metrics against speed and accuracy of the data their partners were sending in.
- While the data shows which programs are most effective, Geoff Zimmerman from Strive didn’t foresee this being used punitively against less effective programs. Rather, he sees this as a way to point towards what they needed to be doing better—encouraging a culture of continuous improvement.
Cutting Through the Noise—Dean Karlin
Dean Karlin is a Yale professor and founder of Innovations for Poverty Action. Karlin emphasized looking at a counterfactual measurement, meaning exploring what would have happened without the program existing. He used a hypothetical microcredit example to demonstrate: xx% of folks who were offered a microloan reported xx% increase in income the following year. Presumably, a loan application process means that loans are given to hard-working people. So the question is how much of the increase is due to the fact that these hardworking go-getters would have increased their income on their own, and how much was directly due to the program? This is a tough thing to measure, but it is really key in understanding your organization’s impact.
- Make sure you’re getting credible data: ensure that your method of data collection itself is not skewing the results.
- Collecting data may be cheap, but it’s still not free. The organizational cost to understand and process that data can be large; cutting through the noise takes time and energy.
Using Machine Learning to Improve Decision-Making—Sendhil Mullainathan
Sendhil Mullainathan, Harvard economics professor and Co-founder of Ideas42.org, was such a great speaker that I forgot to take notes. Mullainathan described the evolution of machine learning. Fascinating. My big takeaway is this: currently there are very sophisticated predictive analytics being used to recommend movies and serve up ads. This same technology has the potential to be applied in interesting ways in the social sector—the example he used was using predictive analytics to help judges make decisions around setting bail and releasing people before trial. He’s an evangelist for leveraging machine learning in this context and very passionate on this approach’s ability to de-bias decision-making. It’s a big idea, that’s for sure, and very compelling. I check out the ideas42.org blog a lot. But now I guess I better buy that book of his.
Behold the Cost-to-Benefit Ratio—Robin Hood Foundation
It’s not popular to admit this in the non-profit sector: I have a finance degree. I got it so I could speak truth to power. I had a few dreams of kicking ass at the SEC; I wouldn’t have been bamboozled by shoddy mortgage-backed securities. I would have saved the world!
Ah, well, that didn’t come to pass, and my finance knowledge sits largely, sadly, unused. However, Michael Weinstein from the Robin Hood Foundation had me dusting off some of those old finance-for-good feelings as he explained the foundation’s rigorous strategy for analyzing grant proposals.
The Robin Hood Foundation concentrates on a single, driving metric in order to be able to compare different grants presented to it. For each grant, they methodically compute its cost:benefit ratio; a grant proposal must result in a minimum $5 benefit for each $1 granted by the foundation. This “relentless monetization” creates a single comparable metric— meaning Robin Hood is able to compare apples to oranges. They are able to compare early childhood education with dental care, but also a program that proposes deep intervention for a small population with a program that proposes smaller involvement with a larger population.
The Robin Hood Foundation has created a number of algorithms – over 150 in fact – to try to capture the “benefit.” A further explanation of their approach and a PDF of the algorithm can be found here.
Michael Weinstein will be the first to admit that these algorithms are in no way perfect, but they provide a strong framework for decision-making.
Here are a few more takeaways:
- Michael Weinstein also emphasized the importance of understanding counterfactual estimates – so you do not overestimate the impact of your program.
- He also brought up the concept of displacement – and the difficulty in capturing that in a rigorous way. For instance, if you provide job training, how do you know the people you trained are not simply displacing other workers, sending them into poverty? Is the pie getting bigger, or is it just being distributed differently?
- Everybody on staff is tracking and using metrics in their work. Data is not a separate department; it is part of everybody’s job.
- Lastly, the number that comes out of the algorithm is not the be-all and end-all. The number is used diagnostically, as an important part of the big picture. Each equation, and each program, is continually evaluated and improved. On the plane home, I read an interesting article about the importance of marrying “small data” (qualitative insights from users and subject experts) with big data. It sounds like Robin Hood Foundation, while putting a large emphasis on numbers, does not ignore the value of small data in their work.
Did you attend the Do Good Data Conference? What were your takeaways?