Predictive Analytics and Big Data
Predictive Analytics and Big Data
- Oscar E. Cariceo,
- Murali NairMurali NairColumbia University
- and Wahaj Bokhari
Predictive analytics is a set of techniques and an advanced methodological and research approach that seeks to reach conclusions about the future, rather than explanations of specific issues or phenomena. The fast growth in popularity and application of data science to different businesses and activities, such as human services and nonprofit management, are related to the emergence and consolidation of big data. In terms of digital networking, the availability of data produced by individuals every day is enormous. Tools and techniques such as machine learning, deep learning, visualization, time series analysis, networking analysis, natural language processing, and text mining may help support evidence-based practice for social workers. Predictive analytics and big data offer an opportunity to enhance innovative social change and people’s well-being.
- Macro Practice
The future implications of data are hard to ignore especially with the growth of human- and machine-generated data due to the advancement in technology (Chong et al., 2015). Currently, data is used in almost every field from business to medicine to political science in order to reach better outcomes and enhance the impact of practitioners. This literature review addresses the significance of cultivating data-driven culture and implementing predictive analytics to improve human service organizations and practitioners. Additionally, this literature review will explain the concept of “big data,” data analytics, and the challenge of privacy in implementing data science in macro social work practice from a social impact theory perspective.
One of the key techniques in predictive analytics is machine learning. Machine learning as a research approach offers a set of techniques to predict the behavior of certain problems and phenomena. Machine learning techniques can support professional decisions from data produced by organizations, agencies, and users by applying generalizable algorithms. By examining data, machine learning discovers patterns that are present within any data set and can teach computers through examples, by training data to test specific hypotheses and predict what would be a certain outcome, based on a current scenario.
For social work research, machine learning generates predictions as a key element to improving social interventions on complex social issues by providing better inferences from data and establishing more precise estimated effects, for example in organizations that seek to improve their outcomes.
Predictive analytics, big data, and data science, can offer new options for evidence-based macro practice. Data visualization as a significant source of information, and large data sets provided by new communication technologies, such as social media and the Internet, allow the sharing of knowledge and communication of information.
Outcomes and results from social work interventions are critical for improved evidence-based practice. In terms of social work interventions, predictive analytics and big data will transform macro practice.
This article provides a general description based on specific literature reviews. Topics explained include data driven culture, big data, analytics, machine learning, and ethical dilemmas.
Finally, this article provides a broad glossary on data science, big data, predictive analytics, projects, and websites about these topics.
The organization culture can be defined as a system of common values, knowledge, attitudes, morals, customs, and norms that are respected by organization members as a framework for the organization’s practices, behavior, and goals (Bratasanu, 2019).
The paradigm shift in organizational management results from implementing data in the process of decision-making. Bratasanu (2019) suggests that organizational power shifts and internal bureaucracy adjustments created by the emerging technologies resulted from data transparency trends and decision-making level changes. “Traditional decision processes in organizations are based on different levels of internal bureaucracy with time-consuming pre-authorization from senior managers before acting on the decision. To use the potential of data driven decision-making, current structures need to be redesigned, empowering employees” (Bratasanu, 2019, p. 78).
Predictive analytics is part of the realm of advanced quantitative analysis. Its main feature is the application of current and retrospective data to predict the future or anticipate certain scenarios related to business activities, behaviors, and trends. Statistical analysis is the most important technique to deploy insight form predictive analytics. Tools such as form analytical queries, programming, machine learning, algorithms, and data set manipulation are critical to build predictive models and achieve numerical values or scores based on probabilities of a particular event occurring.
The perception of analytics by the leadership team of the organization can determine its level of maturity in analytic-related processes. According to Bratasanu (2019), there are five levels of an organization’s analytic-related processes: (a) the organization can build reports, (b) the organization can build and deploy models, (c) the organization has repeatable processes for building and deploying analytics, (d) the organization has consistent enterprise-wide processes for analytics, and (e) the enterprise’s analytics is strategy driven. Therefore, it is essential for social sector leaders to encourage their organizations to build reports regularly as a building block of implementing analytics in the organizational culture. The absence of such reports will hinder the progress of data-driven culture in organizations. It can be beneficial for organizations to hire a data advocate in each department for the purpose of collecting data and generating reports as a first step to promote data-driven culture.
Social work practice can reach new levels of scientific and discipline knowledge by applying data science techniques. Thus, challenges for practitioners will be related to creating innovative approaches to making use of existing quantitative data.
Data is a crucial asset for all types of organizations. Therefore, for macro social practice, the data, and the subsequent information that can be obtained from it should be treated as an asset. The development and application of new information and communication technologies, has a significant impact on the lives of individuals, one of its consequences being the production of large volumes of data. For this reason, social connections are one of the most significant uses of Internet-related technologies and tools, which should be a concern and a topic of work for the practice of social work.
Traditionally, marketing, financial services, and insurance companies have been the main consumers of predictive analytics. Health care, retail, and manufacturing are examples of well-known industries that take advantage of the massive production of data resulting from global networking and the extensive use of the Internet. Finding potentially fraudulent financial transactions and identifying patients at risk of developing specific diseases are among the most common outcomes of predictive analytics.
Indeed, this kind of application can be translated into social work practice and social services management. Therefore, predictive analytics and big data should be matters of concern and learning for practitioners and scholars.
Bratasanu (2019) argues that the strength of the organization in the marketplace results from the organization’s understanding of its data set and how the organization can use this data set for its advantage. Most important, the author emphasizes the importance of the information technology revolution in the way that individuals access and process information, making knowledge the most powerful resource for development.
At first glance, predictive analytics use variables that can be measured and analyzed to predict the behavior of individuals and organizations. For instance, a human services agency is likely to consider improving the quality of its programs by analyzing variables like age, gender, location, socioeconomic status, level of vulnerabilities, violence exposure, and so on, when economic turndowns emerge. Thus, a model based on predictive analytics can combine multiple variables to establish a predictive model capable of evaluating forthcoming probabilities in terms of donors or investment.
One relevant difference between predictive analytics and standard statistical approaches, including inferential statistics, is the merge with computer science techniques, such as software engineering. Predictive models rely on advanced algorithms and methodologies such as logistic regressions, time series analysis, and decision trees, which are deploying and supported by programming and software development.
It is imperative for well-established organizations to use advanced analytics in order to maximize their outcomes, and keep pace with start-ups, which tend to quickly adopt a data-driven culture. According to Bratasanu (2019), an annual survey on the top 1,000 companies shows that 84.1% of executives are enabling better decisions using advanced analytics by actively investing in big data and artificial intelligence (AI), and 73.2% of these executives report measurable results from their investment. However, issues arise from the slow adoption by the big organizations of a data-driven culture. Even though 99% of organizations are targeting a data-driven culture, only one-third of these organizations are succeeding. Thus, disruptions in big organizations can occur because of the fast adoption of such a culture by start-ups (Bratasanu, 2019). Cultivating a data-driven culture is a significant challenge to improve the macro practice in the human service sector. Start-up nonprofits are aware of the importance of this culture for the progress of their organizations as well as for the power of their social impacts.
The predictive analysis process is not always linear, which is one reason to use the idea of iteration. Correlations often occur where data scientists are not looking. For that reason, some companies are filling data scientist positions by hiring people who have academic backgrounds in physics and other hard scientific disciplines and, according to the scientific method, feel comfortable where the data takes them. Even if companies follow the more conventional path of recruiting data scientists trained in mathematics, statistics, and computer science, an open mind in data exploration is a key attribute of effective predictive analytics.
The question that may arise is how to manage the change in data-driven organizations with the everyday growth and expansion of data. According to (Bratasanu, 2019), there are five areas in which there is a need to manage this change effectively: (a) leadership vision, (b) finding talented management, (c) technology correlated with big data strategy, (d) decision-making processes maximizing cross-functional cooperation, and (e) a company culture that creates an environment that prioritizes moving away from acting on hunches and instinct.
The development of predictive analytics as a key element in modern organizations has been coupled with the availability of large data systems, commonly known as big data. As companies, organizations, agencies, and individuals have produced and, eventually, accumulated larger and broader pieces of data and information, they have created better opportunities for them to mine and extract data for generating predictive insights. Also, the development and commercialization of machine-learning tools have extended predictive analytics options and applications.
In this context, it is possible to think about an interactive process to incorporate a predictive analytics approach to social work practice. Predictive analytics requires a high level of expertise with statistics, and it is necessary to build up a team of professionals to create models. As a result, specific roles emerge within organizations. Data scientists and data engineers perform tasks related to collection of relevant data in order to process this data for analysis. Software developers and business analysts are in charge of generating data visualization, dashboards, and reports.
The social work field can leverage data science to aspire to new standards for quality performance at macro level. One of the important concepts nowadays in data science is “big data.” The term “big data” made its academic debut in a 1988 computer science paper and first emerged in the information technology industry in the mid-1990s. Since the turn of the 21st century, the term “big data” has become extremely popular. While there is disagreement in academia about how to define “big data,” the most common definitions are based on the (three Vs) framework that Douglas Lanley presented in an unpublished paper in 2011. Surprisingly, this paper did not mention the term “big data” at all. According to these definitions, data is big if it has three main attributes as a guideline. “These attributes are high volume (the sheer size of the data set is large), velocity (data are produced in or almost in real time), and variety (data come in different types and formats and may be structured or unstructured)” (Grossman & Pedahzur, 2020, p. 228). They state, “Unstructured data may take the form of text, audio, video, or any other observable manifestation.” Unstructured data includes web pages and digital footprints in social media such as Twitter and Facebook, whereas structured data is more accessible and manageable, such as survey data (Chong et al., 2015). According to the United States Census Bureau “big data” is:
Big data is a term used to describe data sources that are fast-changing, large in both size and breadth of information, and come from sources other than surveys. Examples include retail and payroll transactions, satellite images, and “smart” devices. Big data also includes administrative data from federal, state, and local governments, as well as third party providers. Typically, big data is “found” or “observed,” in that it is collected passively as the digital exhausts of personal and commercial activities. Such seemingly disparate data sources and techniques can provide unique insights that were not easily observable previously (2015, p. 722).
It should be noted that the large and complex nature of “big data” prevents the traditional methods of data analysis from being effective. Therefore, big data analysis uses different algorithms and techniques to infer general trends over the entire set of data rather than looking for relationships between individual pieces of data. Consequently, in the world of big data analysis, what counts is the “quantity” but not the quality of information, as big data analysis methods focus on finding “correlation” rather than “causation” from the general trends over the entire set of data. Equally important, big data analysis and application have become possible only due to the advancement in technology that allows collection, storage, and interpretation of data at a low cost. These factors coupled with improved analysis techniques make big data influential in industry in a way that was not possible in the past (Electronic Privacy Information Center [EPIC], 2020).
These circumstances allow social workers and human services professionals to explore new horizons and roles within organizations. Training in data analysis, statistical models, and mathematical tools are relevant, as are critical thinking, experience, and cultural competence.
For social work practice, it is important to know that machine learning generates predictions, not explanations, thus it does not aim to establish causal effects. Therefore, a key question is how machine learning techniques can improve social interventions on complex social issues.
Big data supports predictive analytics in terms of improving causal inferences. The analysis of large data sets produced by people through their own activities on the Internet allows the design of experiments from observational data. Predictions, as the main outcome of machine learning techniques, can improve the precision of estimated effects (Grimmer & Stewart, 2013). In general, that means, eventually, predictive analytics may not need to establish conclusions and decisions, if there is enough data available on a specific topic.
Big data must consider computation, interpretation, and transparency. In this sense, social workers should be aware that data-driven culture and predictive analytics, based on large data sets, imply the use of specific standards and protocols to design potential scenarios and issues.
Government agencies, nonprofits, and human services organizations produce information and data to understand society, ultimately improving policy design. If these recommendations are followed, interventions could have a greater likelihood of being effective:
The emphasis now should also be placed on improving socioeconomic well-being of all members of society. By focusing on the measurement of both rewards and risks for individuals and society at large, it will be possible to create a knowledge-based society where the policymaking process is transparent and supported by empirical evidence (Cariceo et al., 2018, p. 90).
Another concept of data science that can enhance the complex tasks of social work is data analytics. “Based on large data sets, data analytics integrates processes and tools, including predictive analytics, statistics, data mining, artificial intelligence, and natural language processing, building invaluable insights to improve firm decision making” (Bratasanu, 2019, p. 80). In other words, analytics helps develop insights by using data efficiently and applying quantitative and qualitative analysis. Therefore, analytics can improve planning, management, measurement, and learning by providing fact-based decisions (Islam et al., 2018).
The application of data science in macro social practice involves intervention in three domains: first, programming, as part of skills related to computer science and software engineering; second, statistics, as a key subject for the processing and analysis of quantitative data; and finally, a deep understanding of the subject under investigation from the perspective of data science. This last element involves professional experience in areas that can range from business to physics, including health care and social services. Practicing social work enables one to take advantage of the possibilities of data and, in this way, improve interventions.
Associated with these elements, it is possible to affirm that data science is an innovative approach. The collection of information and the production of knowledge are parts of its results and its objectives. As a rapidly growing methodology, data science must be part of the social worker toolbox, at every level. Social work professionals in particular should focus on providing critical insight into complex social problems that can be solved and improved with the combination of big data, machine learning models, information visualization, and data mining.
From a technical point of view, generating useful knowledge from the collection and processing of data is a key asset for organizations. For example, real-time data collection and proper analysis can save time and resources. This situation implies a deeper collaboration between organizations, consumers, and clients. In other words, data-based analysis and decision-making processes from the production of useful information are comparative advantages in social management projects aimed at permanent change.
There are three types of analytics: predictive analytics (prediction of upcoming events based on historical data), prescriptive analytics (utilization of scenarios to provide decision support), and descriptive analytics (exploration and discovery of information in the data set) (Islam et al., 2018). Predictive analytics is widely used to improve the process of decision-making for better outcomes in the future and to avoid any obstacles that may hinder the progress of the organizations by making decisions based on instincts. Hence, implementing predictive analytics in the management and leadership of nonprofit organizations can enhance the social impact of these organizations as well as bring such organizations to the next level.
Predictive analytics is part of data science approach, since both methodologies can improve social work knowledge by designing innovative outcomes from quantitative and numerical information. Thus, predictive analytics and data science focus on real issues and establish analysis from everyday scenarios, incorporating flexible frameworks (Cariceo et al., 2018).
The main difference between predictive analytics and data science is that predictive analytics involves more technical procedures, whereas data science is a broader strategy. To clarify this, it is useful to identify the goals of a specific project. In other words, understand the mission of a business or organization.
Data science projects based on predictive analytics techniques can reach innovative and effective conclusions and program outcomes by collecting, reviewing, and understanding data.
For social work practice, data science can support interventions by achieving new standards for communication. Therefore, social workers can take advantage of data science combining “scientific knowledge and practice intuition to tackle and resolve significant social issues and conflicts” (Cariceo et al., 2018, p. 2).
Predictive analytics includes several steps. Peng and Matsui (2016) explain that data-driven analysis is a “highly iterative and non-linear process, better reflected by a series of epicycles” (p. 4). This idea refers to the concept of epicycles. In other words, where that information is learned in each step.
An epicycle is a small circle whose center moves around the circumference of a larger circle. In data analysis, the iterative process that is applied to all steps of the data analysis can be conceived of as an epicycle that is repeated for each step along the circumference of the entire data analysis process. Some data analyses appear to be fixed and linear, such as algorithms embedded into various software platforms, including apps. However, these algorithms are final data analysis products that have emerged from the very non-linear work of developing and refining a data analysis so that it can be “algorithmized” (Peng & Matsui, 2016, p. 85).
This proposal is useful for social work practice, which works to tackle complex issues, Epicycles represent an easy way to innovate to improve social work research.
Dan Hurley (2018) documented the experience of the Allegheny County Office of Children, Youth and Families (C.Y.F.) with data and forecasting methods. Hurley concluded that predictive analytics is a significant innovation in social issues intervention, such as child protection.
Machine learning is an innovative technique to promote social work interventions and can support the decision-making process of practitioners in order to predict new behaviors based on data produced by organizations, service agencies, users, clients, or individuals. Machine learning techniques include a set of generalizable algorithms that are data driven, which means that rules and solutions are derived by examining data, based on the patterns that exist within any data set. In other words, the goal of machine learning is teaching computers through “examples,” by training data to test specific hypotheses and predict what a certain outcome would be, based on a current scenario, and improving that experience.
Data should be visualized as an asset for human service agencies, especially in the child welfare field. In other words, data collection and proper analysis can save time and resources. Thus, social work research must improve its standards and treat data as a product. This goal can be supported by applying computational social science tools, such as machine learning algorithms.
In terms of innovation, machine learning implies a deep sociocultural transformation based on the emergence and consolidation of the information society, but it also challenges social work approaches and research methods. Rodriguez and Storer (2019), point out that mathematical procedures are replacing progressively the value, for instance, of clinical prediction in child welfare policy. This new scenario tends to establish a proactive decision-making process. It is possible to argue that delivery of more efficient public services is possible through data-driven interventions and modern computing techniques.
Machine learning can be classified into two general categories depending on the nature of the problem that this technique needs to tackle. First, supervised learning involves a data set that is already known in terms of output. Supervised learning problems are categorized into regression problems, which involve a prediction from quantitative variables, using a continuous function; and classification problems, which seek to predict results from discrete qualitative variables.
For social work research, machine learning generates predictions as a key element to improve social interventions on complex social issues by providing better inferences from data and establishing more precise estimated effects.
One of the most interesting techniques of machine learning is text mining. Text mining has become an extensively used approach to identify and extract information from unstructured text. Text mining is applied to extract facts and relationships in a structured form that can be used to annotate specialized databases and to transfer knowledge between domains and more generally within management research to support organizational and strategic decision-making. According to Salloum et al. (2017), data and information produced by organizations and agencies, such as universities or government entities, are stored in digital or electronic documents. This situation implies a significant amount of information in the form of unstructured data. Thus, text mining seeks to detect unknown information by extracting it automatically from diverse text-based data sources. Text mining deals with specific features that generally require processing. In order to achieve this step, text mining is related to natural language processing (NLP). NLP is a field of artificial intelligence that provides techniques to allow machines to read and understand meaning from human languages (Salloum et al., 2017, p. 141). NLP methods provide the background to analyze data and reveal human languages.
Big Data and the Ethical Dilemma
Since people produce a large amount of data from their daily activities, social service agencies, organizations, and professionals must incorporate collaboration with all stakeholders, as a regular and permanent practice, aimed at collection and processing of the data. Thus, collaborative strategies and tools such as crowdsourcing are key elements to improve data management and to achieve sustainable change in organizational environments. The key element of new technologies is how people collect and share information, which changes the way people interact with each other, including the manifestation of social problems.
The growth of big data usage since the turn of the 21st century aligns with the increased instances of actual or perceived ethical violations. White and colleagues (2019) identified four themes in relation to ethics and big data: privacy, security, ownership, and evidence-based decision-making.
Privacy can be defined as the nondisclosure of personal information to the public (White et al., 2019). Privacy protection fails in big data analytics, as the notion of anonymity is eroded in the big data paradigm. The reason is that even if every piece of data or information is striped of personal information, the relationships and trends between the individual pieces of data can reveal an individual’s identity (EPIC, 2020). Therefore, privacy protection is a challenge in social work practice, as more information is generated daily and shared on different networks.
In this context, machine learning projects use the same algorithms and techniques that companies use to increase profits, based on information produced and systematized from people’s activity. Therefore, it is possible to enhance the missions of human service organizations and agencies through data-driven planning. It is important to indicate, as Garcia (2016) affirms, that the manipulation of sensitive information and the construction of algorithms can generate distortions and biases. It is even possible that prejudices are replicated, which put at risk one of the main elements associated with technology as a platform to improve social welfare. According to the American Academy of Social Work and Social Welfare (AASWSW), “Information and communication technologies can be implemented to improve the effectiveness of social programs” (AASWSW, 2018, p. 60).
Analysis of big data to create a positive social impact has some authenticity issues due to the lack of infrastructure and capabilities in humanitarian organizations to conduct big data analysis. A study by White et al. (2019) stated the following:
Social impact theory provides some explanation to consider how big data insights could lead to positive or negative social impacts and the ethics behind it. Social impact theory highlights that people’s actions affect others in social situations and the impact of their actions can be measured visually along with measures that include three laws: social forces, psychosocial law, and multiplication/division of impact.
The authenticity issue in big data depends on how big data is used and manipulated to generate positive or negative social impacts. Hence, it is imperative to ask the following questions: Was the decision reached from big data analysis ethical? Who did it affect? White et al. (2019, p. 13) concluded:
Depending on how big data is used, it could create fear or euphoria. The fear or euphoria can be used to initiate obedience and a need for immediate action. The action can either grow or reduce the number of people involved and be responsible for behavioral changes in the affected groups. The impact of the behavioral changes applied directly affects the social impact and ethics of the situation.
Biases in machine learning applications, for example, can lead to conflicts regarding the availability of open data. Thus, standards must be developed for the collection and manipulation of data and sensitive information within organizations. In short, the challenge for social work is to acquire new skills based on new technologies. Research on new technologies is increasing; however, knowledge of the specific data science approach is limited, particularly in professions such as social work.
There needs to be a broad understanding of data science as part of new communication technologies and as a tool, both in research and in social work practice. It is also important to identify the social workers who have the greatest ability to enter technology-based practice. Data can help build cost-effective models of social work practice to improve service delivery. This literature review highlights the importance as well as the challenges of implementing data science to improve macro social work practice.
Machine learning refers to the application of massive amounts of data to achieve responses to complex issues. It is also a subset of artificial intelligence, focused on analyzing patterns present in data to improve the decision-making process and, ultimately, organizational learning. In other words, machine learning is a computer science technique that is capable of teaching machines to learn without programming them explicitly (Mohammed et al., 2016).
Deep learning is a subfield of machine learning techniques, based on artificial neural network algorithms. Its main feature is a learning process based on human learning patterns (Alom et al., 2019). Deep learning offers a large set of techniques to design artificial intelligence models and projects.
Data visualization is the process of manipulating pieces of data in order to convert them into a systematic and logical way into the visual elements that make up the final graphic. The final goal of data visualization is providing a quick response in terms of reporting results and retrieving insights from data (Wilke, 2019).
Data engineering is a multidisciplinary field that include statistics, database management, data visualization, optimization, and information theory. A data engineer, as an information technology professional, works on transforming data into a useful format for analysis (Sadiku et al., 2018). According to Kretz (2019), data engineering is “the link between the management’s big data strategy and the data scientists that need to work with data” (p. 12).
Cluster analysis is a part of machine learning algorithms that implies analysis from not labeled data. Cluster analysis focuses on retrieving information from unseen patterns in data sets. In general, clustering methods include decomposing a data set into a flat partition consisting of clusters to produce a local approximation of a global objective function (Usama et al., 2019).
Time Series Analysis
A time series analysis is a set of observations in which each one is recorded at a specific time. A time series model for observed data is a specification of the joint distributions. Forecasting and control are applications of time series to reach insights from data (Madsen, 2018).
Blockchain technologies are distributed digital ledgers of cryptographically signed transactions that are grouped into blocks. Each block is “cryptographically linked to the previous one after validation and undergoing a consensus decision” (Yaga et al., 2018, p. 1). The most famous application of blockchain is cryptocurrency, for example, bitcoin.
Network analysis is a statistical approach that contain graphical representations of the relationships, known as edges, and variables, which are conceptualized as nodes. Network analysis provides the capacity to estimate complex patterns of relationships, and the network structure can be analyzed to reveal core features of the network (Hevey, 2018).
Natural Language Processing (NLP)
Natural language processing is a set of computational techniques that allows one to analyze, understand, and obtain meaning from human language in useful way. By using NLP, researchers and developers can organize and structure knowledge to perform tasks such as automatic summary, translation, recognition of named entities, relationship extraction, sentiment analysis, voice recognition, and segmentation of topics (Malak & Ogurek, 2019).
This section describes three projects that have been using data science. Their focus is to tackle social problems by the application of big data.
R-Ladies is a worldwide organization focused on promoting gender diversity in the data science field. As a diversity initiative, the mission of R-Ladies is to achieve proportionate representation by encouraging, inspiring, and empowering people of genders currently underrepresented in the R community, which is a programming language to implement predictive analytics and big data projects. This project applies big data in its website interaction and database sharing.
DataKind is a nonprofit that promotes social change by supporting organizations to achieve long-term projects by applying data science and big data. Its mission is to help organizations to define their needs into data science problems and solve them with advanced analytics and big data mining.
Crisis Text Line is a free support service for people confronting crisis episodes, by text messages from cell phone and Facebook. A trained counselor conducts this process based on complex algorithms that help to predict crisis situations and suggests courses of action based on data analysis.
The following discusses predictive analytics initiatives that seek to improve social intervention and overcome complex social issues.
Crime: By manipulating data from the U.S. Census, the FBI created machine learning models to discover patterns of crime in U.S. cities. The main outcomes had implications for public policy in terms of number of police officers who should be allocated in larger cities and the budget for law enforcement.
Public health: Clinical data can help to improve information and knowledge about complex diseases like Crohn’s disease and diabetes. By the application of machine learning techniques, it is possible to predict compounds that are likely to be of therapeutic value. In addition, global pandemics, like MERS, SARS, and recently COVID-19, which jump from species to species and ultimately affect humans, can be tackled using data from health institutions and scientific research centers. For instance, it is possible to predict what kind of species may likely be infected in the next pandemic.
Society and scientific collaboration networks are an important component of predictive analytics. The combination of proper data and specific predictive models can help expand knowledge about the economy and gross domestic product of nations. It is possible to highlight inequalities in global networks of scientific collaboration to make economic progress. Another example is the data about violence and assaults on women on U.S. college campuses.
Data-driven culture is a key element for enhancing macro social work practice. This approach is critical for the social work profession and practitioners. Thus, it is important to improve technical training and data-oriented knowledge for social work research and practice.
In this context, predictive analytics is one of the most significant skills that can help to improve social work practice. Predictive analytics allows one to discover patterns, anticipate complex social issues, and act accordingly.
Social workers must take advantage of new technologies provided by data science, data engineering, machine learning, and artificial intelligence. These innovative technologies from computer science can be applied to social issues and complex economic and inequality problems.
It is important to acknowledge that predictive analytics and data science aim to anticipate specific results, rather than offer explanations about social problems. This situation represents a challenge to new research topics in the social work profession and education.
Big data, as a general term to explain the massive production of information from individuals on daily basis, is a key element. Ethical concerns emerge from this new scenario. Social workers, practitioners, students, and scholars must be aware of new vulnerabilities linked to big data management, in particular in terms of discrimination based on data model construction and biases that could remain in the design of algorithms.
This article represents a general reflection and a descriptive narrative about predictive analytics, big data, and data science as a new paradigm of social work research. All of the techniques and tools described, including advanced topics such as artificial intelligence, can represent a set of new skills for practitioners.
Projects were presented in order to demonstrate initiatives that can inspire new projects and innovative interventions.
This definition has significant implications for social work. Data can change the relationship between agencies, programs, and clients. Large data sets produced by clients and organizations can also deeply modify culture within agencies, in terms of public relations, human resources, supervision, mission, and culture.
Since data is already produced by clients, organizations no longer need to conduct expensive and time-consuming surveys to access opinions and needs. Rather than ask clients what they feel, think, and expect, data offers enough information in real time. Moreover, machine learning is grounded on this data and information.
Predictive analytics and big data involve different domains, including programming and computational tools, as well as statistics. In addition, professional expertise, which can range from business to physics, including health care and social services, is a key element in data science. Therefore, social work management can take advantage of data possibilities and improve interventions, and it is mandatory to perform new and innovative research on this matter.
Data science is, no doubt, an innovative approach to collect data and produce information in order to achieve actionable knowledge. Macro practice in the social work field should focus on providing critical knowledge of complex social problems, combining it with big data, machine learning, visualizations, and predictive analytics to learn about this new approach.
That means that human services need to learn from data, and identify it as an asset for their missions and values. Thus, social workers at a macro practice level should treat data as a product, as it is important to enhance collaboration among organizations, consumers, and clients.
Social services and programs should be based on data-driven analysis and best-informed decision-making processes to reach better outcomes. Currently, data is provided by people, social service agencies, and organizations. Therefore, practitioners should be aware of collaboration in terms of data production. Thus, crowdsourcing is a key element to improving data management to reach sustainable change in organizational settings. Further research on this idea is suggested.
Indeed, ethical dilemmas arise from this new approach. Biases in big data and predictive analytics and machine learning applications could represent serious conflicts. The goal for social work is to gain new skills based on big data and data science.
Full understanding of predictive analytics and big data as a tool in social work practice is needed more now than at any time in the past. It is important to identify practitioners and students who are interested in getting into technology and data-based practice. Data can help to build social work practice based on cost-effective models in order to enhance service delivery.
The goal for practitioners and researchers should be to learn how to create and organize equitable models. This can be done only if social workers and human services professionals get training on programming and data science tools and techniques, including a basic approach to computational social science.
Training in data science, machine learning, and societal computing is a new field of intervention and a topic to develop advanced social work research.
Links to Digital Materials
The Independent Institute is a nonprofit, nonpartisan, public-policy research and educational organization that shapes ideas into lasting impact through publications, conferences, and multimedia programs. Its mission is to advance peaceful, prosperous, and free societies grounded in a commitment to human worth and dignity.
Applying independent thinking to issues that matter, it creates transformational ideas for today’s most pressing social and economic challenges. By connecting these ideas with organizations and networks, the institute inspires action that can unleash an era of unparalleled human flourishing at home and around the globe.
The Data Science Association is a nonprofit professional association of data scientists that serves its members, improving the data science profession, eliminating bias and enhancing diversity, and advancing ethical data science internationally. The DSA is committed to supporting the data science profession with practical resources for data professionals while improving the practice of data science, accrediting schools, and establishing model ethical codes. Membership is open to data scientists, students, academics, and others interested in science and the data science profession.
The Digital Analytics Association advances the use of data to understand and improve the digital world through professional development and community. Professionals in the digital analytics industry committed to growing as analysts and advancing their career may enroll in the DAA Mentoring Program.
The DAA’s expanded mentoring program is open to all DAA members who wish to strengthen their skills in the analytics profession. All interested members are invited to participate as mentees and mentors. Mentors will be guiding analysts.
UNICEF is a UN initiative that uses new approaches and technologies to increase access to essential services, use scarce resources more efficiently, and communicate life-saving information. It seeks to engage young people with technologies, and connect them to their governments and to opportunities.
Oddschile is a nonprofit focused on transforming data into useful information and knowledge to improve decision-making. It develops data analysis and visualization projects to respond to complex social problems and improve interventions.
Good By Data is an initiative to leverage data storytelling for social good. Good By Data works with NPOs, foundations, social enterprises, and corporate social responsibility teams to measure their social impact using data, machine learning, and artificial intelligence.
- Alom, M., Taha, T. M., Yakopcic, C., Westberg, S., & Sidike, P. (2019). A state-of-the-art survey on deep learning theory and architectures. Electronics, 8(3), 292.
- American Academy of Social Work and Social Welfare. (2018). Harness technology for social work: Grand challenges in social work, using evidence to promote social justice and well-being (Fact Sheet No. 8).
- Bratasanu, V. (2019). Leadership decision-making process in the context of data driven tools. Quality: Access to Success, 19, 77–87.
- Cariceo, O., Nair, M., & Lytton, J. (2018). Data science for social work practice. Methodological Innovations, 11(3), 205979911881439.
- Chong, W. K., Man, K. L., & Rho, S. (2015). Big data technology adoption in Chinese small and medium-sized enterprises. Lecture Notes in Engineering and Computer Science, 2, 722–723.
- David, C. C., Corrales, J. C., & Ledezma, A. (2018). How to address the data quality issues in regression models: A guided process for data cleaning. Symmetry, 10(4), 99.
- Electronic Privacy Information Center. (2020). Big data and the future of privacy.
- Garcia, M. (2016). Racist in the machine: The disturbing implications of algorithmic bias. World Policy Journal, 33(4), 111–117.
- Grimmer, J., & Stewart, B. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297.
- Grossman, J., & Pedahzur, A. (2020). Political science and big data: Structured data, unstructured data, and how to use them. Political Science Quarterly, 135(2), 225–257.
- Hevey, D. (2018). Network analysis: A brief overview and tutorial. Health Psychology and Behavioral Medicine, 6(1), 301–328.
- Hurley, D. (2018, January 2). Can an algorithm tell when kids are in danger?. New York Times Magazine, 36–48.
- Islam, M. S., Hasan, M. M., Wang, X., Germack, H. D., & Noor-E-Alam, M. (2018). A systematic review on healthcare analytics: Application and theoretical perspective of data mining. Healthcare (Basel), 6(2), 54.
- Kretz, A. (2019, October 16). The data engineering cookbook. Mastering the plumbing of data science.
- Madsen, H. (2018). Time series analysis. Chapman & Hall/CRC Press.
- Malak, P., & Ogurek, A. (2019). Including natural language processing and machine learning into information retrieval. In 8th International Conference on Natural Language Processing (pp. 14–18). Institute of Information Science and Book Studies, University of Wrocław.
- Mohammed, M. M. Z. E., Khan, M. B., & Bashier, E. B. M. (2016). Machine learning: Algorithms and applications. CRC Press.
- Peng, D., & Matsui, E (2016). The art of data science. Skybrude Consulting.
- Rodriguez, M., & Storer, H. (2019). A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data. Journal of Technology in Human Services, 38(1), 54–86.
- Sadiku, M., Eze, K., & Musa, S. (2018). The essence of data engineering. International Journal of Trend in Research and Development, 5(3), 253–255.
- Salloum, S. A., Al-Emran, M., Monem, A. A., & Shaalan, K. (2017). Using text mining techniques for extracting information from research articles. Intelligent Natural Language Processing: Trends and Applications Studies in Computational Intelligence, 740, 373–397.
- United States Census Bureau. (2018, October 29). Big data
- Usama, M., Hunain, J., Yau, K., Elkhatib, Y., & Hussain, A. (2019). Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access, 7, 65579–65615.
- White, G., Ariyachandra, T., & White, D. (2019). Big data, ethics, and social impact theory—A conceptual framework. Journal of Management & Engineering Integration, 12(1), 9–15.
- Wilke, C. (2019). Fundamentals of data visualization: A primer on making informative and compelling figures. O’Reilly.
- Yaga, D., Mell, P., Roby, N., & Scarfone, K. (2018). Blockchain technology overview. National Institute of Standards and Technology.