An In-depth Guide to AI Data Readiness.
Introduction to AI Data Readiness
The Rise of AI in Business
In the last decade, artificial intelligence (AI) has evolved from a visionary concept into a critical strategic component in a wide variety of industries. With advancements in technology, the adoption of AI in businesses has skyrocketed. Increasingly, companies leverage AI to streamline operations, improve customer experience, and gain a competitive edge. AI’s ability to analyze data, recognize patterns, automate tasks, and make predictions transforms how businesses operate, interact with customers, and make strategic decisions. As a result, AI’s influence is pervasive and growing from healthcare to finance, retail to manufacturing.
The Importance of Data in AI-First Enterprises
As enterprises transition towards becoming AI-First, the role of data becomes increasingly paramount. An AI-First enterprise is one where AI is not just an add-on but an integral part of the business strategy and operations. For these organizations, data serves as the fuel that powers AI technologies.
Data feeds the algorithms that AI uses to learn and make decisions. AI systems cannot develop the insights needed to provide value without substantial, high-quality data. Furthermore, diverse and robust data sets enable AI systems to identify complex patterns and deliver more accurate predictions. Thus, data quality and management become central to any organization’s AI readiness.
Understanding the Current State of Data in Many Enterprises
Despite data’s crucial role in driving AI, the reality is that many organizations are struggling with their data. In a typical legacy enterprise, data is often siloed across various departments, existing in disparate formats and systems. This fragmentation not only creates barriers to access but also hampers the overall quality and reliability of the data.
Moreover, inadequate data governance often results in inconsistencies, duplication, and errors. These issues make the data less valuable for AI and can lead to misguided decisions and strategies. On top of these challenges, enterprises must also navigate data privacy concerns and regulatory requirements, which add another layer of complexity to data management.
Understanding your enterprise’s current state of data is the first step in overcoming these obstacles. It’s an integral part of the journey toward becoming an AI-First enterprise, a journey that begins with a commitment to treating data as a strategic asset. This book aims to guide you on this path, offering insights, strategies, and tools for turning data challenges into opportunities for AI-powered transformation.
Understanding the Importance of Data and Data Readiness in AI
Introduction to Data as the Lifeblood of AI
Artificial Intelligence (AI) has often been likened to a high-powered engine capable of driving businesses into a future of efficiency, innovation, and growth. However, even the most potent engine cannot function without fuel. In the world of AI, data is that vital fuel.
Data feeds into AI systems, driving their ability to learn, adapt, and produce results. Whether helping a retailer predict next season’s hottest trends or assisting a healthcare provider in diagnosing a rare disease, AI relies on data to provide solutions and insights. Without data, AI would be akin to a car without gas – structurally impressive but fundamentally powerless.
AI Uses Data
AI uses data through machine learning, a subset of AI that involves algorithms to parse data, learn from it, and make predictions or decisions. The more data fed into these algorithms, the more the AI can learn, and its outputs can become more accurate.
Training an AI model involves feeding it large amounts of data, which it uses to recognize patterns and make connections. For example, an AI model might be trained to recognize spam emails by analyzing thousands of examples. Over time, the AI “learns” what characterizes a spam email and can accurately identify and filter them.
AI also uses data for ongoing learning and improvement. As it interacts with data in real-world applications, AI continues to learn and refine its predictions and decisions. This continuous feedback loop enables AI to provide increasingly sophisticated insights and automation over time.
Real-World Examples of AI and Data Synergy
The power of AI and data synergy can be seen in various industries. For example, in healthcare, AI algorithms analyze large datasets to help diagnose diseases or predict patient outcomes. For instance, AI can examine hundreds of thousands of medical images to detect abnormalities, such as signs of cancer, that a human might miss.
In the retail industry, AI uses customer data to deliver personalized experiences. For example, based on a customer’s past purchases, browsing history, and other behavioral data, AI can generate product recommendations tailored to that individual’s preferences.
In finance, AI algorithms pore over vast amounts of market data, historical trends, and economic indicators to provide accurate forecasts and risk assessments. These predictions can then guide investment decisions and risk management strategies.
Challenges in Data Management
Data quality is another common issue. Inaccurate, incomplete, or outdated data can lead to faulty AI outputs, essentially a case of “garbage in, garbage out.”
Furthermore, ensuring data privacy and compliance with regulations like the General Data Protection Regulation (GDPR) can be complex. Businesses must balance the need for comprehensive data with the responsibility to protect personal information.
Finally, many companies face a skills gap in data management and AI. The specialized knowledge required to handle large datasets and run AI models is in high demand, making finding and retaining qualified personnel challenging.
Despite these challenges, effective data management is not just possible but necessary for any enterprise aiming to leverage the power of AI. By understanding these issues and implementing targeted strategies to address them, businesses can unlock the full potential of their data and propel their AI initiatives forward.
Identifying the Data Challenges in Your Enterprise
Common Data Issues in Legacy Companies
Legacy companies, often characterized by longstanding operations and established infrastructures, can face unique challenges when managing data. For example, data in these companies are frequently segregated across various systems and departments, resulting in “data silos.” These silos can obstruct a holistic view of the company’s operations and hamper effective decision-making.
Data quality is another prevalent issue in legacy companies. Without a proper data governance framework in place, issues such as inconsistencies, errors, and duplication become commonplace. Additionally, legacy systems might lack the agility to handle today’s data volume, velocity, and variety effectively, leading to bottlenecks and inefficiencies.
Assessing Your Company’s Data Readiness
Assessing your company’s data readiness is crucial before embarking on the journey toward becoming an AI-First enterprise. This includes understanding the current state of your data infrastructure, the quality and consistency of your data, and your existing data management practices.
Conducting a comprehensive data audit is a practical first step. This process should identify where data resides, who has access to it, how it’s currently used, and the existing data collection, storage, and analysis processes. Additionally, the audit should evaluate data quality, privacy and security protocols, and regulatory compliance.
Understanding the Impact of Poor Data Management on AI Efforts
Poor data management can profoundly impact AI efforts. At its core, AI relies on quality data to function optimally. However, AI models may produce misleading or erroneous results if the data is inaccurate, inconsistent, or incomplete. This situation often leads to subpar decisions and could diminish the overall faith in AI initiatives within the organization.
Additionally, if data is not easily accessible or is locked in silos, it becomes challenging for AI models to leverage it. AI systems cannot deliver comprehensive insights or make fully informed decisions without a unified, holistic view of the data.
Case Study Analysis
Examining case studies of companies that have successfully navigated their data challenges can provide valuable insights. For instance, consider a major financial institution that struggled with siloed and inconsistent data. Recognizing this issue, they implemented a company-wide data governance framework, established a data management team, and invested in modern data infrastructure. As a result, they significantly improved their data quality and readiness, allowing their AI initiatives to thrive.
Another example might be a healthcare provider that grappled with data privacy concerns while wanting to leverage AI for patient diagnosis. By working closely with legal, ethical, and AI experts, they developed stringent data anonymization protocols and rigorous data access policies. This allowed them to use AI effectively while ensuring the utmost privacy and protection for their patients’ data.
These case studies demonstrate that data management challenges can be complex but not insurmountable. With a strategic approach and commitment to improving data readiness, enterprises can set a solid foundation for successful AI implementation.
Creating a Data-Driven Culture
The Importance of a Data-Driven Mindset
A data-driven culture is a crucial component of a successful AI-First enterprise. This culture involves prioritizing data in decision-making processes, aligning operations around data, and promoting data literacy across all levels of the organization.
A data-driven mindset means data is more than just a byproduct of operations; it’s a valued asset and a source of strategic insights. This approach facilitates evidence-based decision-making, encourages innovation, and supports continuous improvement. Furthermore, it plays a significant role in enabling effective data management and optimizing AI initiatives.
Steps to Build a Data-Driven Culture
- Leadership Buy-In: Leadership must endorse and exemplify the importance of data. They should communicate the benefits of a data-driven approach and make it a core part of the company’s strategy.
- Promote Data Literacy: Invest in training programs to improve employees’ understanding and usage of data. Equip them with the skills to interpret and utilize data in their daily work.
- Incorporate Data into Decision-Making Processes: Encourage the use of data in decision-making at all levels. Create a culture where data back opinions and hunches.
- Democratize Data Access: Enable employees to access relevant data while maintaining necessary security and privacy controls. This empowers them to leverage data independently for insights and decision-making.
- Reward Data-Driven Actions: Recognize and reward employees who leverage data effectively to drive business results. This encourages others to follow suit.
Engaging Employees in Data Management and AI Initiatives
Employee engagement is a crucial aspect of creating a data-driven culture. Start by explaining the benefits of data management and AI at the organizational and individual levels. Next, provide opportunities for employees to get involved in these initiatives.
For instance, you could establish cross-functional teams to lead data management projects or set up workshops where employees can learn about AI and its applications. Encouraging employees to participate in data and AI projects improves their understanding and fosters a sense of ownership and investment in the company’s data-driven future.
Case Study: Successful Implementation of Data-Driven Culture
Consider the example of a global e-commerce company that successfully built a data-driven culture. Then, realizing the potential of AI to improve customer experience, the company decided to shift towards becoming an AI-First enterprise.
Leadership communicated this vision clearly to employees, explaining how AI and data could enhance their operations and decision-making abilities. As a result, they launched a company-wide training program to enhance data literacy and integrated data into their strategic decision-making processes.
They established “AI Champions” across the organization to engage employees – individuals vital in promoting and supporting AI initiatives. The company also launched an internal data platform, providing employees easy access to relevant data while maintaining rigorous security measures.
As a result, the company successfully created a culture where data is viewed as a critical asset, and AI is seen as a strategic tool for growth. This culture shift was pivotal in the company’s successful transition to becoming an AI-First enterprise, significantly improving customer experience, operational efficiency, and business performance.
Principles of Effective Data Management
Understanding Data Governance
A robust data governance strategy should define clear roles and responsibilities for data management within the company, establish data collection and use guidelines, and create procedures for resolving data-related issues. Moreover, it should align with the company’s overall business strategy and support its transition toward becoming an AI-First enterprise.
Importance of Data Quality and Consistency
Data quality and consistency are cornerstones of effective data management. Unfortunately, AI models can produce erroneous outputs without reliable and consistent data, leading to misguided decisions and diminished trust in AI initiatives.
Promoting data quality involves ensuring that data is accurate, up-to-date, and complete. Implementing data validation processes, routinely cleaning data, and checking for errors and inconsistencies are critical to maintaining data quality.
Consistency refers to maintaining a uniform format and structure across all data. This uniformity makes data easier to analyze and use, facilitating more effective AI applications. A consistent data model and data entry and formatting standards can significantly enhance data consistency.
The Role of Metadata in Effective Data Management
Metadata, or data about data, is crucial in effective data management. It provides contextual information about data, such as its source, format, owner, and creation or modification date. This context can be precious for managing, interpreting, and utilizing data.
Metadata can also aid in data discovery, helping users locate the data they need and understand its relevance. Additionally, it can support data governance efforts by providing insights into data usage, quality, and lineage. Maintaining comprehensive and accurate metadata should be crucial to any data management strategy.
Implementing Data Security and Privacy Measures
Implementing robust data security and privacy measures is paramount in an era of increasing cyber threats and strict data privacy regulations. This involves protecting data from unauthorized access, breaches, and theft while ensuring compliance with privacy laws.
Data security measures might include encryption, strong access controls, and network security practices. Data privacy involves anonymizing or pseudonymizing personal data, obtaining necessary consent for data collection and use, and adhering to data minimization principles.
Data security and privacy should not be afterthoughts but integral aspects of data management. Companies can better protect their data assets and avoid potential reputational damage and regulatory penalties by building these considerations into data management processes.
Building a Data Infrastructure for AI
Overview of a Robust Data Infrastructure
A robust data infrastructure serves as the backbone for AI initiatives. It involves the technologies and systems needed to collect, store, manage, and analyze data effectively. This includes databases, data warehouses, data lakes, data management systems, and data processing and analytics tools.
A well-designed data infrastructure should be secure, scalable, and flexible. It should enable easy access to data, support data governance, and facilitate the integration of diverse data sources. Importantly, it should be able to handle the volume, velocity, and variety of data necessary for AI applications.
Selecting the Right Data Storage and Management Systems
Choosing the proper data storage and management systems is critical to building a data infrastructure for AI. The selection should consider the types of data the company handles, the required processing capabilities, and the necessary scalability.
Traditional relational databases or modern cloud-based data warehouses might be suitable for structured data. For unstructured or semi-structured data, a data lake could be more appropriate. Finally, some companies might find a hybrid approach beneficial, leveraging data warehouses and data lakes to meet diverse data needs.
Data management systems should be chosen based on their ability to support data governance, facilitate data access and integration, and handle the required data volume and complexity.
Importance of Scalable Data Infrastructure for AI
AI initiatives often require large volumes of data and can necessitate processing this data at high speed. Therefore, data infrastructure must be scalable to accommodate growing data volumes and processing requirements.
Scalability ensures that the infrastructure can handle increased data loads without compromising performance. In addition, it allows the infrastructure to grow with the company’s data needs, supporting the ongoing development and refinement of AI models.
Cloud-based solutions often achieve A scalable data infrastructure, providing virtually unlimited storage and processing capabilities. They also offer the flexibility to scale up or down as needed, ensuring efficient resource utilization.
Integrating Data Sources and Building Data Pipelines
Data integration involves combining data from different sources into a unified view. This can be crucial for AI applications, which often benefit from having access to diverse, comprehensive data.
Building data pipelines is a crucial aspect of data integration. Data pipelines automate data flow from source systems to databases or data warehouses. They can extract, transform, and load data, ensuring it’s in the suitable format and location for analysis.
Data pipelines should be reliable, efficient, and capable of handling the necessary data volume and complexity. They should also support data governance by maintaining data quality and facilitating tracking and auditing of data flows.
Building a robust data infrastructure for AI is a complex process, but it’s a fundamental step toward becoming an AI-First enterprise. By carefully selecting systems, ensuring scalability, and prioritizing integration, companies can set the stage for successful AI initiatives.
Implementing Data Cleaning and Preparation Processes
Understanding the Importance of Clean and Prepared Data for AI
Data cleaning and preparation are vital steps in becoming an AI-First enterprise. AI models thrive on high-quality, well-structured data. Without clean and prepared data, these models may deliver inaccurate or unreliable results, leading to suboptimal decisions.
Data cleaning involves removing errors, inconsistencies, and redundancies, while data preparation involves transforming raw data into a format suitable for analysis. These processes improve data quality, enhance data consistency, and facilitate more effective AI initiatives.
Techniques and Tools for Data Cleaning and Preparation
Data cleaning techniques include deduplication, outlier detection, and handling of missing values. Deduplication eliminates duplicate entries, outlier detection identifies and deals with anomalous values, and missing value handling involves imputing or disregarding incomplete data entries.
For data preparation, techniques might include data transformation, normalization, and feature engineering. Data transformation alters data to make it suitable for analysis (for instance, converting categorical data into numerical data), normalization adjusts values to a standard scale, and feature engineering creates new variables from existing data to improve model performance.
Several tools can facilitate data cleaning and preparation. These range from programming languages like Python and R, which offer libraries for data manipulation, to specialized data prep tools like Alteryx, Trifacta, and Talend.
Establishing an Ongoing Data Maintenance Process
Data cleaning and preparation should not be one-off activities but part of an ongoing data maintenance process. As new data comes in, it must be cleaned and prepared to ensure consistency and quality.
Automating these processes can significantly enhance efficiency and accuracy. For example, this might involve creating scripts for data cleaning, utilizing ETL (Extract, Transform, Load) tools for data preparation, or implementing data quality management tools for continuous monitoring and cleaning of data.
In addition, maintaining an updated data dictionary or catalog can provide valuable context for data, aiding in the data preparation process and enhancing the overall usefulness of the data.
Case Study: Transforming Raw Data into AI-Ready Data
Consider the case of an online retailer looking to leverage AI for personalized product recommendations. The company had vast raw customer data, including browsing history, purchase history, and customer feedback. However, this data was messy and inconsistent, with multiple missing values and duplicate entries.
Recognizing the importance of clean and prepared data for their AI initiative, the company implemented a rigorous data-cleaning process. They used deduplication techniques to remove duplicate entries, imputed missing values based on business rules, and utilized outlier detection to identify and handle anomalous data points.
For data preparation, the company transformed categorical data into numerical data, normalized purchase values, and engineered new features, such as average purchase value and time since the last purchase.
Additionally, they automated their data cleaning and preparation processes to handle incoming data and maintain high data quality continuously. This resulted in AI-ready data and set the stage for a successful AI initiative, leading to more accurate product recommendations and improved customer satisfaction.
Bridging the Skills Gap
Identifying Necessary Skills for an AI-First Enterprise
The first step in bridging the skills gap is identifying the necessary skills for an AI-First enterprise. This will likely include data-related competencies like data analysis, data engineering, and data science, as well as AI-specific skills such as machine learning, deep learning, and natural language processing.
Understanding data privacy and security principles, familiarity with AI and data technologies and platforms, and the ability to interpret and apply AI outputs are also essential. In addition, softer skills like problem-solving, critical thinking, and the ability to communicate complex data and AI concepts clearly should not be overlooked.
Training and Upskilling Current Staff
Training and upskilling current staff is an effective way to address the skills gap. In addition, companies can provide data and AI-related training programs to help employees expand their skill sets. These could include workshops, online courses, certifications, or even advanced degree programs in relevant fields.
However, training should not be limited to technical skills. Equally important is training employees to think critically about AI and data, understand their potential and limitations, and make decisions based on data and AI insights.
Hiring Strategies for Data and AI Roles
When it comes to hiring new talent, companies need a strategic approach. This means clearly defining the skills and qualifications required for each role, seeking out candidates with these skills, and creating a hiring process that can effectively evaluate these skills.
Companies might also consider attracting top talent. This could involve offering competitive salaries and benefits, creating a positive work environment, or showcasing the company’s commitment to AI and data innovation.
Given the demand for data and AI skills, companies should consider non-traditional talent sources, such as boot camp graduates, self-taught individuals, or professionals transitioning from other fields.
Collaborating with External AI and Data Experts
Collaborating with external AI and data experts is another way to bridge the skills gap. This could involve partnering with consulting firms, engaging freelance specialists, or working with AI research institutions.
External experts can provide valuable insights and skills, support the development and implementation of AI initiatives, and help transfer knowledge to internal teams. They can also offer an outside perspective, which can be beneficial for identifying new opportunities or addressing challenges.
However, companies should ensure such collaborations are based on mutual respect and learning. The goal should be to benefit from the experts’ skills and build the company’s internal AI and data capabilities over time.
Bridging the skills gap is crucial for becoming an AI-First enterprise. Companies can build the talent they need to leverage AI and data effectively by identifying necessary skills, investing in training, and strategically hiring and collaborating with external experts.
Building and Implementing an AI Strategy
Defining an AI Vision for Your Company
Defining an AI vision involves setting clear objectives for what you hope to achieve with AI. This vision should be aligned with your company’s broader mission, goals, and strategic priorities. Consider which areas of your business could benefit most from AI and how AI could support your business model.
Your AI vision might involve improving decision-making, enhancing customer experience, optimizing operations, or driving innovation. Importantly, your AI vision should be communicated clearly and consistently to all stakeholders to foster understanding and buy-in.
Developing a Roadmap for AI Implementation
A roadmap for AI implementation is a detailed plan that outlines the steps needed to realize your AI vision. This includes identifying necessary resources, defining key activities, and establishing a timeline for implementation.
Your roadmap should consider both technical and non-technical aspects of AI implementation. On the technical side, this might involve building data infrastructure, developing AI models, or integrating AI solutions into existing systems. On the non-technical side, this could include training employees, establishing data governance protocols, or managing change.
Remember that implementing AI is a journey, not a one-time event. Your roadmap should therefore be flexible and adaptable, able to accommodate new insights, changes in business conditions, or shifts in AI technologies.
Identifying and Prioritizing AI Projects
Identifying and prioritizing AI projects involves selecting specific use cases for AI and deciding in what order to tackle them. Consider the potential impact of each project, its alignment with your AI vision, and the feasibility of implementation.
Prioritization should consider the resources required for each project, the expected benefits, and the level of risk involved. It’s often effective to start with small, manageable tasks to deliver quick wins and build momentum for larger, more complex initiatives.
Measuring the Success of Your AI Initiatives
Finally, it’s crucial to measure the success of your AI initiatives. This involves defining key performance indicators (KPIs) that align with your AI vision and objectives and regularly monitoring these KPIs.
Success measures might include operational metrics, such as reduced processing time or improved accuracy, and business outcomes, like increased sales, improved customer satisfaction, or reduced costs.
In addition to these quantitative measures, qualitative feedback can provide valuable insights into the effectiveness of your AI initiatives. This could involve surveying employees about their experiences with AI or seeking feedback from customers about AI-driven products or services.
Measuring success helps demonstrate the value of your AI initiatives and provides insights for refining your AI strategy and improving future AI projects. Of course, building and implementing an AI strategy is a significant undertaking. Still, with a clear vision, a well-planned roadmap, prioritized tasks, and robust success measures, companies can successfully navigate this journey and emerge as AI-First enterprises.
The Future is AI-First
Reflecting on the Journey to Becoming an AI-First Enterprise
The journey to becoming an AI-First enterprise is neither quick nor easy. It demands substantial investment in time, resources, and effort. But reflecting on this journey, it’s clear that the rewards are well worth the challenges. The power of AI to transform business processes, improve decision-making, and drive innovation is immense, making the journey worthwhile and essential in today’s competitive business environment.
It’s also a journey that’s unique to each enterprise. Every company starts from a different place, with different resources, skills, and challenges. But any enterprise can become AI-First with a clear vision, an actionable roadmap, a committed team, and determination.
Exploring Future Trends in AI and Data Management
As we look to the future, we can expect the role of AI and data management in enterprises to continue to grow. Several trends are worth noting.
First, adopting AI will likely become even more widespread, with AI technologies increasingly integrated into everyday business processes. Second, the demand for data and AI skills will likely continue to rise, emphasizing the importance of ongoing training and upskilling.
Furthermore, the evolution of AI technologies will present new opportunities and challenges. For example, advances in autonomous systems, AI explainability, and AI ethics will continue to shape the AI landscape. Similarly, trends such as data privacy and security, data democratization, and real-time data processing will become increasingly important in data management.
Encouragement for Continued Growth and Development
While the journey to becoming an AI-First enterprise is challenging, it’s essential to remember that it’s also an ongoing process. Even when you’ve achieved your initial AI goals, there will always be new opportunities to explore, new skills to learn, and new challenges to overcome.
So, keep learning, exploring, and pushing the boundaries of what’s possible with AI. Continue to build on your successes, learn from your failures, and strive for continual improvement. The AI journey doesn’t have an endpoint – it’s about continuous growth and development.
In closing, the future of business is undeniably AI-First. The enterprises that embrace this future that commit to leveraging AI and data to drive decision-making, innovation, and value creation will be the ones that succeed in this new era.
Becoming an AI-First enterprise is no small task, but it’s achievable with the right approach. So, here’s to your journey toward becoming an AI-First enterprise – may it be challenging, rewarding, and filled with exciting discoveries.
Top AI Data Platforms
- Google Cloud AI Platform: Offers a comprehensive suite of machine learning products and services designed to help businesses build, deploy, and scale AI models. Google Cloud AI
- Amazon Web Services (AWS) AI: Provides AI services offering machine learning and deep learning technologies. AWS AI allows developers to build applications that can provide predictions based on data. AWS AI
- Microsoft Azure AI: Offers a range of AI services and tools to build AI solutions. It also integrates seamlessly with other Azure services. Microsoft Azure AI
- IBM Watson is a suite of AI tools and applications that allows users to apply AI to their business processes, products, and services. IBM Watson
- Oracle AI: Offers a comprehensive suite of AI tools and capabilities integrated across Oracle Cloud applications and infrastructure. Oracle AI
- Tableau is a leading data visualization tool for interactive dashboards, real-time analytics, and more. Tableau
- SAS: Offers a wide range of software and services for data management and analytics, including AI and machine learning. SAS
- Informatica: Provides a complete suite of data management products, including integration, quality, master data management, etc. Informatica
- Alteryx: Provides a platform that allows data analysts and scientists to blend data, perform predictive analytics, and create visualizations. Alteryx
- TensorFlow: An open-source platform developed by Google for creating and training machine learning and deep learning models. TensorFlow
- PyTorch: An open-source machine learning library developed by Facebook’s AI Research lab for computer vision and natural language processing applications. PyTorch
- DataRobot: A platform that allows users to build and deploy machine learning models in a fraction of the time. DataRobot
- RapidMiner: Offers a data science platform that provides data preparation, machine learning, deep learning, text mining, and predictive model deployment. RapidMiner
- Knime: An open-source, user-friendly, and comprehensive data analytics platform allowing users to analyze and model data through visual programming. Knime
- Databricks: Provides a unified data analytics platform for massive-scale data engineering and collaborative data science. Databricks