How Does HR Support Data Scientists to Build Inclusive AI Training Sets?



Every company is a tech company these days. With the rise of e-commerce, cloud computing, big data, machine learning and mobile devices, companies are realizing the value of, not just a presence on the web, but the advantages of empowering their customers and partners to do business directly over the internet.

One of the biggest benefits of the new world of cloud-based business operations is the ability to collect transactional data in real time. When a consumer walks down the aisles of the local retailer, there’s no way for the business to accurately understand their interests. Shopping online is different because each page or product that is rendered on the web leaves a digital footprint. Companies can aggregate this information and are able to determine, for example, that 80% of their visitors checked out the new sweater or book or coffee maker – and of those 80%, 66% of them put it in their online cart for potential purchase, and 40% of them went through with a purchase.  This level of insight into consumer and partner behavior has been a game changer for years. Obviously, there are privacy concerns and lots of discussion about what is appropriate.

Huh? What do Artificial Intelligence (AI), Machine Learning (ML), and Predictive Analytics have to do with this?

Think about your daily commute. Do you drive? Take the bus? Stop for a coffee? You probably do roughly the same thing almost every day. How about the grocery store? Over the long term, you probably tend to buy many of the same items. If you buy generic brand 2% milk every two weeks, it won’t take long for someone to be able to predict when you will buy milk again. This is the essence of ML and predictive analytics. What’s different now is the wealth of behavioral data – big data.

What is a Training Set?

So, if you were to ask someone to predict your future milk purchases today, without any data, they would be guessing. Do you buy a half gallon? Is it fat-free? Organic?  The likelihood of their predictive “algorithms” being accurate would be very low.

However, if they had the history of your milk purchase patterns for the last five years, that would improve the accuracy of their prediction. Your milk purchase history is a “set” of “training” data that would train the machine algorithms to help predict your behavior. But it’s more than that – because more data from other sources can improve accuracy that your own data cannot.  What if you are married and about to have their first child? Or moving to a new home? How does your milk-purchasing behavior change? Do you switch from generic to brand name? Organic for the new baby? – Your own personal historical data won’t help there, but if there was training set data from a million other milk purchasers, then the algorithm could predict your most-likely behavior.

How can HR Professionals Help?

As an individual data scientist who is gathering data to train the machine algorithms to make these predictions, you are interested in an accurate prediction and want as much data as you can get your hands on. You are probably interested in including all the data you can get a hold of - including gender, race, age, work history, criminal history, salary information, marital status, family size, economic status, employment status, political affiliation, or educational background – to name a few – as influential factors in your model – you want your training data to include everything.  As an HR professional, would you like to advise the data scientist about whether to considering these sources?  The data scientist’s model might show that criminal history is the most significant factor in predicting milk purchase behavior for new families. Is that ethical?  What if your company is pushing ads or coupons for expensive brand name organic milk to unemployed single mom’s with high school education – because that’s what the algorithm predicted based on the selection of the training data? This is where HR professionals can take a role in shaping the future of AI, ML, and predictive analytics!


We live in a brave new world and the coming machine age will change things dramatically, sometimes in a frightening and draconian way. The seed of the machine intelligence lies in the individual engineer who is writing the code, the data scientist who is performing the queries and creating the algorithms.

HR professionals can help guide and advise, not just from a privacy perspective, but from a human perspective – how do we ensure a future world where the values and integrity are built into the algorithms that change the world.

The machines are going to learn from training data. HR professionals owe it to the world to oversee the collection and creation of the data sets that the machines will learn from. If humanity is going to avoid falling prey to machine overlords, we have to start with HR oversight of the data and algorithms from which the machines will learn.


The SHRM Blog does not accept solicitation for guest posts.

Add new comment

Please enter the text you see in the image below: