Proposed Usecase: Data Science

The Market Viewpoint

Data Science — AKA DataOps, MLOps, etc

Common challenges

Common challenges in data science are described on the Data Science with GitLab use case page and generally include being cross-functional, agile, and iterative while unlocking the value in an organization’s data. To do this, data teams need to:

  • Collaborate both inside and outside their teams, and often inside and outside their organization
  • Plan and manage projects and sprints, with tools flexible enough to support scrum, kanban, and more
  • Version control everything: manage and track different versions of files, models, test cases, data sets
  • Automate key workflow steps, that are otherwise slow and subject to manual errors
  • Streamline testing and validation of work, making it much faster and more repeatable
  • Simplify infrastructure management and often across multiple cloud providers

To be fleshed out as with other use cases.

Keywords and definitions

Keywords and definitions

Personas

User Personas

Who are the users

  • Data Analyst
  • Data Engineer
  • Data Scientist
  • Platform (or DataOps) Engineer
  • AI Engineer

Buyer Personas

Who are the buyers

  • Data Consumer (executive level)
  • Enterprise Data Steward
  • Data Team Manager
  • Engineering Manager of Data Products, or of Data Infrastructure
  • Director of Data and Analytics, or Chief Data Officer

Message House

The message house provides a structure to describe and discuss value and differentiators for data science.

Discovery Questions

  • list key discovery questions

Analyst Coverage

Media coverage, 20 Feb. of Gartner’s 2020 Magic Quadrant For Data Science And Machine Learning Platforms.

Internal to GitLab:

AR Plan: We will not have an AR Plan for this usecase, at this time. Generally, usecase AR plans provide key details on how we intend to engage with the analyst community.

Market Requirements

Capability Description Typical features enabling this capability Value/ROI
Data Science tools integrations Solution supports strong integrations both upstream and downstream such as with ETL, data warehouses, artifact repositories, security scanning, compliance management, etc. There is flexibility for users who need or want a balance between native capabilities and integrations. Particular need for Open Core and Open Source support including Git and Docker, Kubernetes, Jupyter Notebooks, Python and R, Hadoop and Spark. Integrations generally with binary repos, IDEs, APIs, third party libraries, or extensibility via plugins. Increases efficiency. Lessens cost and the extra work that comes along with potential migrations.
Protect and secure assets The solution provides mechanisms to host assets (repos), place and manage different change permissions for the users that access those repos as well as keep a detailed chain of custody of all changes these assets are subject of. Single sign-on, code ownership, change reviews, change approvals, manage allowed IPs, Activity stream, GPG signed commit, Reject unsigned commits, Protected branches, branching, committer access rules, Compliance dashboard etc. Secures IP and valuable assets. Provides information on project history changes
Supports numerous assets The solution is able to manage and maintain the version history of the diverse assets and support the development patterns that each asset implies Component reuse, traceability, design management, branching, diffing, merging, object storage, design versioning Able to manage assets and files for the entire development team, no matter how diverse, creating a single source of truth for the product configuration and making visibility and communication available at every level
Foster Collaboration The solution is designed to enable and foster collaboration among team members. It also streamlines agreed collaboration with automation of repetitive tasks Create fast new branches of the project, add new files/assets, collaborate on proposed changes, review comments, suggest changes, webIDE, suggestion approvals, conflict resolution, merge, diffing, hand-offs, Design management and operations, workflow automation, Wiki, snippets, version controlled snippets, Automatically update or close related issue(s) when a merge request is merged, Configurable issue closing pattern, display merge request status for builds in CI system, visibility into security scans and build stats. Code quality increase and improved release velocity through team review and validation.
Build and test automation Streamlines application development workflow by connecting simple, repeatable, automated tasks into a series of interdependent automatic builds and associated tests. Run and manage automated tasks in the background, preview and validate changes before it’s merged to production. Ensure software is consistently built and tested without manual intervention, enabling developers to get rapid feedback if their code changes introduce defects or vulnerabilities. Teams have control over where automated jobs run either in the cloud (public/private/hybrid) or using shared infrastructure. CI/CD pipelines, scalable resources, job orchestration/work distribution, automated deployments, caching, external repository integrations, and the ability to run isolated, automated, tests (such as unit tests, regression tests, etc.). Development teams can work with speed and efficiency. Catch potential errors sooner rather than later before they intensify.
Cloud-agnostic deploy and manage asdf asdf asdf

Top 3 Differentiators

Differentiator Value Proof Point
Leading SCM and CI in one application GitLab enables streamlined code reviews and collaboration at proven enterprise scale, making development workflows easier to manage and minimizing context switching required between tools in complex DevOps toolchains. Users can release software faster and outpace the competition with the ability to quickly respond to changes in the market. Forrester names GitLab among the leaders in Continuous Integration Tools in 2017, Alteryx uses GitLab to have code reviews, source control, CI, and CD all tied together. Axway overcomes legacy SCM and complex toolchain.
Open Source; Everyone Can Contribute Open core development model allows anyone to contribute to the functionality of the product. Uniquely transparent product development process engaging customers, partners, and the community. Strong and growing community - thousands of organizations and millions of users. Over 3,000 active community code contributors. Siemens needed to improve and enhance their developer tools, and actively contribute to GitLab project with upstream commits.
Deploy Your Software Anywhere Deploy and manage your models in any environment, including any cloud with support for GCP, AWS, Azure, OpenShift, VMWare, On Prem, Bare Metal, etc. Gain workflow portability - one deployment workflow regardless of destination. Provides a complete DevOps platform that allows teams to have the same productivity metrics, governance, and other connective tissue, no matter what cloud they use. Ask Media Group found it difficult to manage the process of building and deploying microservices. With GitLab Premium, their developers can immediately begin to contribute a new service that can be deployed to AWS as soon as they start. Gartner’s 2019 Hype Cycle for Infrastructure and Operations Automation: GitLab helped to define the market, and is recognized as a relevant vendor for both Continuous Delivery and Toolchain Orchestration.

The GitLab Solution

Competitive Comparison

TBD - will be a comparison grid leveraging the capabilities

Proof Points - customers

Quotes and reviews

  • List of customer quotes/reviews from public sites

Case Studies

  • List of case studies NOTE: In short, concise value/ Proofpoint format

References to help you close

  • Link to SFDC list of usecase specific references

Partners

  • Describe how key partners help enable this usecase

Key Value (at tiers)

Premium

  • Describe the value proposition of why Premium for this usecase

Ultimate

  • Describe the value proposition of why Ultimate for this usecase

Resources

Presentations

  • LINK

Whitepapers and infographics

  • LINK

Videos (including basic demo videos)

  • LINK

Integrations Demo Videos

  • LINK

Clickthrough & Live Demos

  • Link

Data Science Usecase: Keywords

Keywords for data science

terms are linked to their Wikipedia articles

  • data science: using scientific methods, algorithms, and systems to extract knowledge and insights from data
  • decision science: for business problems, data science combined with behavioral science and design thinking to understand end users
  • business intelligence (BI): analyzing and reporting historical data, like sales statistics and operational metrics, to guide strategic decision-making
  • data analysis: inspecting, cleansing, transforming, and modeling data, with the goal of discovering useful information
  • data mining: discovering patterns in data with methods and tools like machine learning, statistics, and database systems
  • exploratory data analysis (EDA): summarizing a dataset’s main characteristics and informing the development of more complex models or logical next steps
  • data engineering: building infrastructure with which data are gathered, cleaned, stored, and prepped for data science
  • DataOps: automated, process-oriented methodologies to improve quality and reduce cycle time in data analytics — akin to DevOps for data, with these key differences
  • artificial intelligence (AI): computer systems that can perform tasks that normally require human intelligence, using human reasoning as a model
  • AIOps: DataOps at the intersection of AI and big data, often using machine learning with the intent to feed continuous insights into continuous improvement, and often including collaborative automation, performance monitoring, and event correlations
  • machine learning (ML): A subset of AI in which a system learns from input by identifying patterns in that data, then applies those patterns to new problems or requests, allowing data scientists to teach a computer to carry out tasks rather than programming it step-by-step
  • supervised learning: a subset of ML with a data scientist guiding or teaching the desired conclusion to the algorithm, such as a system learning to identify problems by being trained on a dataset of correctly labeled and characterized problems
  • deep learning: advanced machine learning systems with multiple input/output layers, as opposed to shallow systems having one round of data input/output
  • MLOps: akin to DevOps or DataOps, collaboration and communication between data scientists and operations professionals to manage the production ML lifecycle, with increased automation and improved quality per business and regulatory requirements

terms are linked to their Wikipedia articles

Data Science Usecase: Message House

The message house provides a structure to describe and discuss value and differentiators for data science.

We will not have a Positioning Statement for this usecase, at this time. Generally, this would describe how GitLab fits and is differentiated in the market for this usecase.

Key-Values Value 1: <List a key message/ value proposition> Value 2: Value 3:
Promise (list and describe the positive business outcomes)
Pain points (describe common pain points)
Why GitLab (list specific features that support this value)

| Proof points | (list specific analyst reports, case studies, testimonials, etc.) |

Last modified September 19, 2024: Fix broken links (38406a39)