Harnessing the power of machine learning for improved decision-making

INDUSTRY INSIGHT

Harnessing the power of machine learning for improved decision-making

By Samuel Stewart
Jan 05, 2021

Across government, IT managers are looking to harness the power of artificial intelligence and machine learning techniques (AI/ML) to extract and analyze data to support mission delivery and better serve citizens.

Practically every large federal agency is executing some type of proof of concept or pilot project related to AI/ML technologies. The government’s AI toolkit is diverse and spans the federal administrative state, according to a report commissioned by the Administrative Conference of the United States (ACUS). Nearly half of the 142 federal agencies canvassed have experimented with AI/ML tools, the report, Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies, states.

Moreover, AI tools are already improving agency operations across the full range of governance tasks, including regulatory mandate enforcement, adjudicating government benefits and privileges, monitoring and analyzing risks to public safety and health, providing weather forecasting information and extracting information from the trove of government data to address consumer complaints.

Agencies with mature data science practices are further along in their AI/ML exploration. However, because agencies are at different stages in their digital journeys, many federal decision-makers still struggle to understand AI/ML. They need a better grasp of the skill sets and best practices needed to derive meaningful insights from data powered by AI/ML tools.

Understanding how AI/ML works

AI mimics human cognitive functions such as the ability to sense, reason, act and adapt, giving machines the ability to act intelligently. Machine learning is a component of AI, which involves the training of algorithms or models that then give predictions about data it has yet to observe. ML models are not programmed like conventional algorithms. They are trained using data — such as words, log data, time series data or images — and make predictions on actions to perform.

Within the field of machine learning, there are two main types of tasks: supervised and unsupervised.

With supervised learning, data analysts have prior knowledge of what the output values for their samples should be. The AI system is specifically told what to look for, so the model is trained until it can detect underlying patterns and relationships. For example, an email spam filter is a machine learning program that can learn to flag spam after being given examples of spam emails that are flagged by users and examples of regular non-spam emails. The examples the system uses to learn are called the training set.

Unsupervised learning looks for previously undetected patterns in a dataset with no pre-existing labels and with a minimum of human supervision. For instance, data points with similar characteristics can be automatically grouped into clusters for anomaly detection, such as in fraud detection or identifying defective mechanical parts in predictive maintenance.

Supervised, unsupervised in action

It is not a matter of which approach is better. Both supervised and unsupervised learning are needed for machine learning to be effective.

Both approaches were applied recently to help a large defense financial management and comptroller office resolve over $2 billion in unmatched transactions in an enterprise resource planning system. Many tasks required significant manual effort, so the organization implemented a robotic process automation solution to automatically access data from various financial management systems and process transactions without human intervention. However, RPA fell short when data variances exceeded tolerance for matching data and documents, so AI/ML techniques were used to resolve the unmatched transactions.

The data analyst team used supervised learning with preexisting rules that resulted in these transactions. The team was then able to provide additional value because they applied unsupervised ML techniques to find patterns in the data that they were not previously aware of.

To get a better sense of how AI/ML can help agencies better manage data, it is worth considering these three steps:

Start with existing domain-specific knowledge. This includes processes and rules that employees handle manually. The analysts working with the defense comptroller office knew transactions open for more than five days were a problem that had to be addressed. From that knowledge, they built a model to find new features for finding unmatched transactions.
Automate to find new patterns. Applying automation and unsupervised learning, the team found reference numbers were incorrect, something they were not aware of at the onset of the project.
Validate patterns against business relevance. Determine which ones are valuable from a business perspective. Each pattern must be validated against common-sense checks, because the patterns may be a statistical anomaly. Analysts might discover many patterns, but that does mean all of them are valuable.

Data analysts should think of these steps as a continuous loop. If the output from unsupervised learning is meaningful, they can incorporate it into the supervised learning modeling. Thus, they are involved in a continuous learning process as they explore the data together.

Avoiding pitfalls

It is important for IT teams to realize they cannot just feed data into machine learning models, especially with unsupervised learning, which is a little more art than science. That is where humans really need to be involved. Also, analysts should avoid over-fitting models seeking to derive too much insight.

Remember: AI/ML and RPA are meant to augment humans in the workforce, not merely replace people with autonomous robots or chatbots. To be effective, agencies must strategically organize around the right people, processes and technologies to harness the power of innovative technologies such as AI/ML to achieve the performance they need at scale.

About the Author

Samuel Stewart is a data scientist with World Wide Technology.

Published at Tue, 05 Jan 2021 21:33:45 +0000

6 Ways to Combat Bias in Machine Learning

Just like humans, data are deeply susceptible to bias. Humans create data, and so it therefore reflects our own biases, assumptions and blind spots. It’s unavoidable, then, that social biases exist in both company data and in the background data that feeds modern natural language processing (NLP) algorithms. Despite this, there are ways to identify and counter biased decision-making by models. Although none of these methods are silver bullets and should always be used in conjunction with human input, they can help address potential issues as they arise.

Sources of Bias in an ML Task

From corpus issues to decision-making problems, bias presents itself in numerous ways in the field of machine learning (ML). All of the following are common sources of bias.

Background Bias. Language-model-based NLP approaches consume web-scale quantities of text to give the NLP systems background knowledge about how language works. Although the benefit of this method is that a small amount of training data can produce excellent results, social biases leak in through the pre-training corpus. One example is a tendency to associate European-American names with positive sentiment and African-American names with negative sentiment.

Perceptive Bias. Many ML training tasks seek to replicate human judgments, and those judgments may be based on existing biases, either conscious or unconscious. For example, one study found a strong tendency for white athletes to be described as hardworking and intelligent whereas Black athletes were labeled as physically powerful and athletic. Any training data coming from human judgment is very likely to contain social biases.

Outcome Bias. Data points not obviously derived from human judgment can also reflect existing social prejudices as well. A loan default is a factual event that either did or did not happen. The event may still be rooted in uneven opportunities, however. For example, people of color have suffered more job losses during the recent downturn and have been slower to regain their jobs. It is important to understand that there is no clear divide between “factual” versus “biased” datasets: Social biases can affect any measurable aspect of an individual’s life.

Availability Bias. Machine learning performs best with clear, frequently repeated patterns. Those who do not fit neatly into such patterns are more likely to be overlooked by ML systems. For example, a company hiring primarily from the U.S. may fail to consider attendees of foreign universities due to a lack of data. The use of different degree names and titles globally could also affect an algorithm’s decision-making.

Best Practices in Debiasing ML

Given all these issues, we should view machine learning with some suspicion – as we should human processes. To make strides in debiasing, we must actively and continually look for signs of bias, build in review processes for outlier cases and stay up to date with advances in the machine learning field.

Below are some of the techniques and processes that we can implement to address bias in ML.

6 Ways to Combat Bias

Anonymization and Direct Calibration
Linear Models
Adversarial Learning
Data Cleaning
Audits and KPI
Human Exploration

Anonymization and Direct Calibration. Removing names and gendered pronouns from documents as they’re processed is a good first step, as is excluding clear markers of protected classes. Although this is an important start, it’s not a complete solution. These signals still show up in many places that are impossible to wholly disentangle. Research has shown that the bias remains in second-order associations between words: For example, “female” and “male” associated words still cluster and can still form undesired signals for the algorithms as a result. Nevertheless, randomizing names as we feed data into a model prevents the algorithm from using preconceptions about the name in its decision making. This is also a good practice in initial resume screening even when done by humans.

Linear Models. Deep models and decision trees can more easily hide their biases than linear models, which provide direct weights for each feature under consideration. For some tasks, then, it may be appropriate to trade the accuracy of more modern methods for the simple explanations of traditional approaches. In other cases, we can use deep learning as a “teacher” algorithm for linear classifiers: a small amount of annotated data is used to train a deep network, which then generates predictions for many more documents. These then train the linear classifier. This can approach deep learning accuracy, but allows a human to view the reasons for a classification, flagging potentially biased features in use.

Adversarial Learning. If a model can’t reliably determine gender or race, it’s difficult for it to perform in a biased manner. Adversarial learning shares the weights in a deep network between two classifiers: One solves the problem of interest, and the other one determines some fact such as the author’s gender. The main classifier is trained like usual, but the adversarial classifier, for example a gender classifier, penalizes weights when the model makes a correct prediction until that classifier consistently fails. If the internal representation of the document contained gender signal, we’d expect the second classifier to eventually discover it. Since it can’t, we can assume the first classifier isn’t making use of this information.

Data Cleaning. In many ways, the best way to reduce bias in our models is to reduce bias in our businesses. The datasets used in language models are too large for manual inspection, but cleaning them is worthwhile. Additionally, training humans to make less biased decisions and observations will help create data that does the same. Employee training in tandem with review of historic data is a great way to improve models while also indirectly addressing other workplace issues. Employees are taught about common biases and sources of them, and review training data looking for examples of biased decisions or language. The training examples can be pruned or corrected, while the employees hopefully become more careful in their own work in the future.

Audits and KPI. Machine learning is complicated, and these models exist within larger human processes that have their own complexities and challenges. Each piece in a business process may look acceptable, yet the aggregate still displays bias. An audit is an occasional deeper examination of either an aspect of a business or how an example moves through the whole process, actively looking for issues. Key Performance Indicators, or KPI, are values that can be monitored to observe whether things are trending in the right direction, such as the percentage of women promoted each year. Audited examples may be atypical, and KPI fluctuate naturally, but looking for possible issues is the first step towards solving them, however that may end up being accomplished.

Human exploration. This is one of the best ways of finding issues in real systems. Studies that sent employers identical resumes with stereotypically white and Black names as their only variable have demonstrated hiring biases in corporate America. Allowing people to play with inputs or search through aggregate statistics is a valuable way to discover unknown systemic issues. For particularly important tasks, you can even implement “bias bounties,” offering cash rewards to people who find clear flaws in an implementation. Depending on data sensitivity, this could be access to the results of models in practice, so researchers can statistically demonstrate ways the models acted biased in the past, or an opportunity to create fake data in an effort to show that some pattern of use causes undesired behavior by the model.

Takeaway: Get Serious About Combating Bias

Clean and plentiful data, appropriate algorithms and human oversight are critical for the successful implementation of artificial intelligence applications in business processes. Especially when data is insufficient, it may not be appropriate to apply these techniques to every problem. It’s also important to recognize the biases in human processes, however; eschewing AI does not make the issues go away.

A commitment to debiasing as an ongoing process, both with regard to ML and human agents, is therefore vital. Doing so can help diversify organizations, mitigate the impact of hidden systemic biases, and promote fairness across an organization, fostering positive outcomes in recruitment, retention and brand awareness efforts.

Up NextInside the Crowdsourced Quest for Inclusive Voice AI

Published at Tue, 05 Jan 2021 20:26:15 +0000