Data Science Usecase: Keywords

Keywords for data science

terms are linked to their Wikipedia articles

data science: using scientific methods, algorithms, and systems to extract knowledge and insights from data
decision science: for business problems, data science combined with behavioral science and design thinking to understand end users
business intelligence (BI): analyzing and reporting historical data, like sales statistics and operational metrics, to guide strategic decision-making
data analysis: inspecting, cleansing, transforming, and modeling data, with the goal of discovering useful information
data mining: discovering patterns in data with methods and tools like machine learning, statistics, and database systems
exploratory data analysis (EDA): summarizing a dataset’s main characteristics and informing the development of more complex models or logical next steps
data engineering: building infrastructure with which data are gathered, cleaned, stored, and prepped for data science
DataOps: automated, process-oriented methodologies to improve quality and reduce cycle time in data analytics — akin to DevOps for data, with these key differences
artificial intelligence (AI): computer systems that can perform tasks that normally require human intelligence, using human reasoning as a model
AIOps: DataOps at the intersection of AI and big data, often using machine learning with the intent to feed continuous insights into continuous improvement, and often including collaborative automation, performance monitoring, and event correlations
machine learning (ML): A subset of AI in which a system learns from input by identifying patterns in that data, then applies those patterns to new problems or requests, allowing data scientists to teach a computer to carry out tasks rather than programming it step-by-step
supervised learning: a subset of ML with a data scientist guiding or teaching the desired conclusion to the algorithm, such as a system learning to identify problems by being trained on a dataset of correctly labeled and characterized problems
deep learning: advanced machine learning systems with multiple input/output layers, as opposed to shallow systems having one round of data input/output
MLOps: akin to DevOps or DataOps, collaboration and communication between data scientists and operations professionals to manage the production ML lifecycle, with increased automation and improved quality per business and regulatory requirements

terms are linked to their Wikipedia articles

ETL (extract, transform, load): data integration from multiple sources, normalized or transformed into a common or standardized format, often to build a data warehouse
data visualization (dataviz): visual representation of text-based information, to help recognize patterns, trends, and correlations and to generally understand the significance of data
data model: defines how datasets are connected to each other and how they are processed and stored
data warehouse: repository where all the data collected by an organization is stored and used as a guide for business decisions
R: programming language for statistical computing, used by statisticians and data miners for data analysis and developing statistical software
Python: programming language popular for manipulating and storing data, as well as for general-purpose programming
SQL (Structured Query Language): declarative programming language used to perform tasks such as updating or retrieving data
big data: data sets too large or complex to be dealt with by traditional data-processing software
classification: an example of supervised learning in which an algorithm puts new data under a pre-existing category based on characteristics for which the category is already known — for example, classification can be used to determine if a customer is likely to spend over $20 online, based similarity to other customers who have previously spent that amount
cluster analysis: like classification, but where the algorithm receives inputted data and finds similarities in the data itself by grouping data points together that are alike, i.e. classification without supervised learning
cross validation: method to validate the stability or accuracy of machine-learning models, often by splitting a training set in two and training an algorithm on one subset before applying it the second
linear regression: modeling the relationship between two variables by fitting a linear equation to the observed data, enabling prediction of an unknown variable based on its related, known variable
causal inference: process that tests whether there is a relationship between cause and effect, often requiring subject matter expertise in addition to good data and algorithms
hypothesis testing: use of statistics to determine the probability that a given hypothesis is true; often used in science
statistical power: the probability of making the correct decision to reject the null hypothesis when the null hypothesis is false, i.e. higher statistical power reflects lower likelihood of concluding incorrectly that a variable has no effect
standard error: the measure of the statistical accuracy of an estimate, such that larger sample size generally decreases standard error

Last modified June 27, 2024: Fix various vale errors (46417d02)

View page source - Edit this page - please contribute.

Data Science Usecase: Keywords

Keywords for data science

Keywords related to data science