June 19, 2015

Small Data & Parvenu Analysts

This article was originally published on the DataScopic blog. It is reprinted here with permission.

duSoleil1The most exciting Excel workshops that I’ve taught included having the attendees take turns plugging their laptops into the projector, and asking for help on some vexing task.

These “rodeo sessions” are opportunities for the students to focus attention on their real work instead of prepared examples. The result is immediate improvement in efficiency, the accuracy of their calculations, or in some cases increased confidence. But there are 3 things that keep coming up:

  1. Parvenu analysts are the masses of smart people who are going into data-driven roles, and they don’t have a data mindset or background
  2. A lot of folks really don’t need much technical skill in order to get their work done
  3. A lot of the world still has high stakes in the accuracy of small data

Let’s explore …

The Parvenu & the Data Mindset

This is something that Keidra Chaney was the first to bring to my attention. When a person transitions from the doing of a task and starts managing that task, data management often comes with the new role. The brand new Call Center Supervisor, Fundraising Director, Social Media Strategist … they all start getting reports, and important people start requesting details; spreadsheets are emailed to them, and a common cry is:

I don’t even know where to start.

Here’s the news: if you’re one of these people, you’re an analyst even though it’s not in your title. You’re a parvenu analyst: the outsider who’s risen up to be among the business analysts and others who may have received some formal training or certification. You may not be accepted as an analyst among analysts, but if you’re responsible for data being right, you’re an analyst.

So, the starting point is to accept that you’re an analyst and data is going to be a big part of your life.

Being an analyst and having a data mindset starts by understanding what you want to ask about the data and what you want to ask of the data. A good analyst can’t be afraid the the Cheshire Cat, and that’s what a dataset can be: it’ll answer your questions with riddles and more questions.

Cheshire Cat

When data shows up on your desk, don’t jump straight into making calculations and telling those important people what you’ve found. The source data could be horribly wrong. Instead, ask questions of this Cheshire cat.

Asking About The Data and Reports (Can the Data be Trusted?)

  • Is this report complete?
  • The data is up-to-date as of how long ago?
  • Is the data updated in real time, daily, weekly, bi-weekly, etc.?
  • Is the data clean, or do I first need to clean out the duplicates, parse addresses, match account numbers with the names of the account reps, and reformat dates?
  • Do the formulas in the report accurately reflect the business rules?

Asking Of The Data and Reports (Does This Data Have the Answers to My Questions?)

  • What do I want to measure, summarize, or extract?
  • Why and for whom are we asking questions of the data?
  • What would be nice to know vs. what do we need to know?
  • Do we need additional reports or data to help fully answer the questions that are being asked?

That’s data analysis! We can teach someone an Excel IF statement and how to write VBA code. However, for someone who’s not used to working with data, they immediately need to start developing an automatic reaction for asking the right questions.

Someone who asks the right questions and copy-pastes their way through the analysis is much more valuable than someone with the technical skill and minimal curiosity.

In this blogpost, Inside the Mind of an Analyst, I describe how Kevin Lehrbass of mySpreadsheetLab brilliantly takes us through a non-technical tutorial where we get to listen to how he thinks through, “What are the different ways that close win can be defined and measured?

Similar to Kevin, a parvenu analyst might ask, “what do we call a refund?” Maybe you don’t want to count refunded shipping charges. You might create several summaries: every type of refund no matter what, including shipping and taxes; refunds only on items; or refunds of just shipping. The answer to these questions are guided by the why, for what, and for whom?

Necessary Technical Skills

The top 3 Excel features that consistently show up as most helpful to people who are new to working with data:

A surprising number of these parvenu analysts don’t need much right away, especially when they are within established processes. The questions they’re trying to answer are basics like: How much? When? How many? How long? Who?

Eventually, all analysts are faced with data cleansing:

  • Clearing duplicates
  • Correcting misspelled city names
  • Peeling apart phone numbers and zip codes that got into the same field
  • Figuring out if Bob Jones and Robert Jones are the same person

That stuff is hard, no one is formally trained to do it, and most hate doing it. We all figure it out on the job. This is finally being talked about as the dark side of Big Data’s glamour.

Getting to the Sexy Data is a blogpost based on a New York Times article that accurately describes the dirty “janitor work” that comes with managing data. Some people spend their time 20% analysis, 80% cleansing. So, imagine someone who loved social media, got promoted, and is now spending a huge amount of time cleaning data. That can feel like a dirty trick – especially when they didn’t expect it and it’s not a strength of theirs.

The solution isn’t always found in a subscription service that’ll create pretty reports. No. We’ve got to prepare, train and empower our people so that they develop a thought process, and get the skills.

Small Data Is Very Much Alive

The third thing that I have seen is that, in spite of big data’s sex appeal, small data hasn’t gone anywhere. In fact, we’ve got more of it, requiring more parvenu analysts.

According to the US Small Business Administration, 54% of sales are generated by small businesses. The IRS shows that 68% of active nonprofits generate less than $250,000. These are small entities that aren’t part of the big data conversation.

Also, add in the people who are in small departments in large companies. Those small departments aren’t always the IT team’s priority. Still, the small businesses, nonprofits, and small departments all have data that we rely on.

What’s the Point?

A lot of the conversation about big data, data-warehousing, and anti-spreadsheets is unfair because there isn’t much said about who should be in the conversation and who shouldn’t. Caught up in the conversation are businesses and nonprofits that are too small for, or haven’t matured to the level of, affording and being able to sustain centralized solutions along with the additional person on the payroll who would be needed to run something like SalesForce. A fantastic article that’s worth reading is by Joe Shepley, VP at Doculabs. Technology Can’t Ever Solve the Information Management Problem is a reminder of the people-processes-tools trinity that’s required to make anything work. So, let’s give some attention to the parvenu analysts and the integrity of our small data.

Image credits: Numbers graphic courtesy of disco-ball; Cheshire Cat image courtesy of feliciacano.

Oz du Soleil
Oz du Soleil is a Microsoft Excel MVP. He's an Excel Trainer whose courses have been described as fun and they get people to relax about using Excel. Oz is a U.S. Navy Veteran, and loves good bourbon, and spicy food. Above everything else, Oz's commitment is to clean data. Oz is author of Guerrilla Data Analysis Using Microsoft Excel, 2nd Ed., and his YouTube Channel is at: https://www.youtube.com/c/OzduSoleilDATA
Interest Categories: Data
Tags: Data, technology management