Also referred to as “metadata” or “data about data” – typically a list of data elements, and for each a description of what the element means. Can include units of measure (e.g., ounces or pounds), the meaning of any special attribute (e.g., how a missing value is coded), and often a source if sources vary across the data set.
Data use guidelines
The set of rules about the use of data – what data can be used for what purpose. (Purposes could include: internal analysis, externally published analysis, models, models used for given purpose, etc.)
Historical data that is used as the dependent variable in a predictive setting. That is, the data that indicates the event, behavior, or state you set out to predict. For example, if you are using consumer credit report patterns to predict future bankruptcy, then public record bankruptcy data (such as who and when bankruptcy was filed) would be used as the outcome data. Predictive analytics projects need both predictive data and outcome data to fuel modeling efforts.
An observation that appears well outside the norm, usually in some context of isolation. So a very warm day in winter might be considered an “outlier.” A reading of 95 degrees in San Francisco in December would be an outlier.
The practice of using data resources (usually historical) to predict or rank-order the occurrence of future outcomes of interest. Good prediction algorithms improve results for businesses by focusing limited resources, taking more profitable actions, or limiting/curtailing risk.
Data that is useful to indicate the probably of a future outcome. For example, high blood pressure readings can be predictive of future heart disease.
The analysis of a single data element, or attribute, relative to a future outcome. This is in contrast to a multivariate analysis, which would examine multiple data elements and their inter-relationships relative to the future outcome. Using the above example, a univariate analysis could relate systolic blood pressure readings to the future occurrence of heart disease, and a multivariate analysis might relate systolic, diastolic, age and gender to the future occurrence of heart disease.