You Were Big Data Before Big Data
Updated: Feb 4
“Like the sands of the hourglass, these are the days of our lives.”
It’s weird to segue from soap operas to data mining, but then again so am I. I started programming from data tapes at the Bureau of Labor Statistics in 1988. We used Job Control Language to tell someone at Boeing Computer Services to load a tape for us, and then execute our commands to analyze hundreds of thousands of records almost instantaneously.
From Day One on the job, I thought if University of Maryland had anything remotely this wonderful, I wouldn’t have given up my PhD fellowship—a four-year free ride I gave up to enter federal service, coach the Labor Pains softball team in the Congressional B League, and learn to program in SAS.
Later I used these skills in my first association job at NAHB, where we had a Data General “mini-computer” that, despite the name, was a cube 4 feet tall and 2 feet wide and deep that had a tape drive. I had a huge office on the 1st floor where homeless guys mooned me during late night projects, but I could easily race up the back stairwell to reverse and reload tapes.
A lot of people care(d) then and now what NAHB had to say about housing starts, housing affordability, and the impact of regulations. We did contract work for HUD on housing costs by housing market, for DOD on base housing allowances, even DOE on efficiency of various modular housing. Almost all of it was “original research” based on tabulations of American Housing Surveys, the population and economic Censuses, sometimes our own surveys of builders and consumers.
It is odd to flash forward to today and hear so many people promise so much in an era where “big data” is allegedly easy to access, analyze, and turn into intelligence and decision support. Frankly, it always was. My SAS license costs me about $5,000 a year. Programming in it is no different than it was more than 30 years ago.
Most of us are sitting on mounds of data in our AMS. It’s generally easy to download, and most AMS have very similar structures. It’s absurdly easy to summarize data for easier analysis. It can be merged with survey data, which we do routinely to avoid having to ask the same questions over and over.
If your data coverage sucks, we can do the opposite and share with you new data to upload or enter. Generally I think of an AMS as a black box, hard to modify regularly, because if anything, data quality has declined as more members have easy access to enter new records or modify existing ones.
The work time spent on a data mining project is often 80% hygiene and learning structures, 20% analysis and interpretation. This is why theoretically you would be better off with a Power BI type user on staff, and there seem to be more on Collaborate every year.
But when it comes to using bits and pieces to fill in the blanks, to integrate opinion research and qualitative insights with the behavior stored as data, we’re often better handling those grains of sand and turning them into castles. As the analogy implies, the sand castle doesn’t stand long, but it looks pretty and it’s even enjoyable to build.