An algorithm is mathematical “logic” or a set of rules used to make calculations. Starting with an initial input (which may be zero or null), the logic or rules are coded or written into software as a set of steps to be followed in conducting calculations, processing data or performing other functions, eventually leading to an output.

Teradata Take: Within the context of big data, algorithms are the primary means for uncovering insights and detecting patterns. Thus, they are essential to realizing the big data business case.

Back to the top

Analytics Platform

An analytics platform is a full-featured technology solution designed to address the needs of large enterprises. Typically, it joins different “tools and analytics systems together with an engine to execute, a database or repository to store and manage the data, data mining processes, and techniques and mechanisms for obtaining and preparing data that is not stored. This solution can be conveyed as a software-only application or as a cloud-based software as a service (SaaS) provided to organizations in need of contextual information that all their data points to, in other words, analytical information based on current data records.”Source: Techopedia

Back to the top

Apache Hive

Apache Hive is an open-source data warehouse infrastructure that provides tools for data summarization, query and analysis. It is specifically designed to support the analysis of large datasets stored in Hadoop files and compatible file systems, such as Amazon S3. Hive was initially developed by data engineers at Facebook in 2008, but is now used by many other companies.

Back to the top

Artificial Intelligence (AI)

AI is an old branch of computer science software for simulating human decision-making. It mimics "learning" and "problem solving" through advanced algorithms and machine learning. AI has grown popular across many industries, with use case examples that include personalization of marketing offers and sales promotions, anti-virus security, equities trading, medical diagnosis, fraud detection and self-driving cars. Big data coupled with deep neural networks and fast parallel processing are currently driving AI growth.

Teradata Take: Teradata’s Sentient Enterprise vision recommends widespread use of automated machine learning algorithms. Business leaders should focus on specific use cases, not the term “AI” itself. After all, algorithms are not human: they don’t think and they are not truly intelligent or conscious. It requires fresh data and program maintenance to improve accuracy and reduce risk in the applications of AI, thus it’s best to be skeptical of Hollywood renderings of AI and general marketing hype about AI.

Back to the top

Behavioral Analytics

Behavioral Analytics is a subset of business analytics that focuses on understanding what consumers and applications do, as well as how and why they act in certain ways. It is particularly prevalent in the realm of eCommerce and online retailing, online gaming and Web applications. In practice, behavioral analytics seeks to connect seemingly unrelated data points and explain or predict outcomes, future trends or the likelihood of certain events. At the heart of behavioral analytics is such data as online navigation paths, clickstreams, social media interactions, purchases or shopping cart abandonment decisions, though it may also include more specific metrics.

Teradata Take: But behavioral analytics can be more than just tracking people. Its principles also apply to the interactions and dynamics between processes, machines and equipment, even macroeconomic trends.

Back to the top

Big Data

“Big data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data-processing applications.” Source: Wikipedia

Teradata take: What is big data? Big data is often described in terms of several “V’s” – volume, variety, velocity, variability, veracity – which speak collectively to the complexity and difficulty in collecting, storing, managing, analyzing and otherwise putting big data to work in creating the most important “V” of all – value.

Back to the top

Big Data Analyics

“Big data analytics refers to the strategy of analyzing large volumes of data … gathered from a wide variety of sources, including social networks, videos, digital images, sensors and sales transaction records. The aim in analyzing all this data is to uncover patterns and connections that might otherwise be invisible, and that might provide valuable insights about the users who created it. Through this insight, businesses may be able to gain an edge over their rivals and make superior business decisions.” Source: Techopedia

Teradata Take: What is big data analytics? Big data analytics isn’t one practice or one tool. Big data visualizations are needed in some situations, while connected analytics are the right answer in others.


Back to the top

Business Intelligence

“Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.” Source: Gartner “Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering.” Source:

Back to the top


Cascading is a platform for developing Big Data applications on Hadoop. It offers a computation engine, systems integration framework, data processing and scheduling capabilities. One important benefit of cascading is that it offers development teams portability so they can move existing applications without incurring the cost to rewrite them. Cascading applications run on and can be ported between different platforms, including MapReduce, Apache Tez and Apache Flink.

Back to the top

Cloud Computing

Cloud computing refers to the practice of using a network of remote servers to store, manage and process data (rather than an on-premise server or a personal computer) with access to such data provided through the Internet (the cloud). Programs, applications and other services may also be hosted in the cloud, which frees companies from the task and expense of building and maintaining data centers and other infrastructure. There are a few types of common cloud computing models. Private clouds provide access to data and services via dedicated data centers or servers for specific audiences (e.g., a company’s employees). They may offer customized infrastructure, storage and networking configurations. Often used by small and medium-sized businesses with fluctuating computing requirements, public clouds are typically based on shared hardware, offering data and services on-demand usually through “pay-as-you-go” models that eliminate maintenance costs. Hybrid clouds combine aspects of both private and public clouds. For example, companies can use the public cloud for data, applications and operations that are not considered mission critical and the private cloud to ensure dedicated resources are available to support core processes and essential computing tasks.

Teradata take: Effective cloud computing capabilities have become essential elements in the most effective Big Data environments.

Back to the top

Cluster Analysis

Cluster analysis or clustering is a statistical classification technique or activity that involves grouping a set of objects or data so that those in the same group (called a cluster) are similar to each other, but different from those in other clusters. It is essential to data mining and discovery, and is often used in the context of machine learning, pattern recognition, image analysis and in bioinformatics and other sectors that analyze large data sets.

Back to the top

Cognitive computing

Cognitive computing is a subset of artificial intelligence. It combines natural language processing with machine learning, rules, and interactive “stateful” programming. It is often used in spoken question-and-answer dialogs. Interactive cognitive systems “remember” the context of the current dialog and use that information to refine the next answer. Cognitive computing requires constant program maintenance and new data to improve the knowledge base. Examples of cognitive technology include Apple Siri, Amazon Alexa and IBM Watson.

Teradata Take: Cognitive computing is still in the early stages of maturity. It requires enormous investment, skill and patience for businesses to apply it effectively. Cognitive systems typically make many mistakes when interacting with humans. We expect cognitive computing to mature rapidly for specific tasks in the next decade. But, again, it’s best to be wary of Hollywood and marketing hype about cognitive computing.

Back to the top

Comparative Analysis

Comparative analysis refers to the comparison of two or more processes, documents, data sets or other objects. Pattern analysis, filtering and decision-tree analytics are forms of comparative analysis. In healthcare, comparative analysis is used to compare large volumes of medical records, documents, images, sensor data and other information to assess the effectiveness of medical diagnoses.

Back to the top

Connection Analytics

Connection analytics is an emerging discipline that helps to discover interrelated connections and influences between people, products, processes machines and systems within a network by mapping those connections and continuously monitoring interactions between them. It has been used to address difficult and persistent business questions relating to, for instance, the influence of thought leaders, the impact of external events or players on financial risk, and the causal relationships between nodes in assessing network performance.

Back to the top

Concurrency/Concurrent computing

Concurrency or concurrent computing refers to the form of computing in which multiple computing tasks occur simultaneously or at overlapping times. These tasks can be handled by individual computers, specific applications or across networks. Concurrent computing is often used in Big Data environments to handles very large data sets. For it to work efficiently and effectively, careful coordination is necessary between systems and across Big Data architectures relative to scheduling tasks, exchanging data and allocating memory.

Back to the top

Correlation Analysis

Correlation analysis refers to the application of statistical analysis and other mathematical techniques to evaluate or measure the relationships between variables. It can be used to define the most likely set of factors that will lead to a specific outcome – like a customer responding to an offer or the performance of financial markets.

Back to the top

Data Analyst

The main tasks of data analysts are to collect, manipulate and analyze data, as well as to prepare reports, which may be include graphs, charts, dashboards and other visualizations. Data analysts also generally serve as guardians or gatekeepers of an organization's data, ensuring that information assets are consistent, complete and current. Many data analysts and business analysts are known for having considerable technical knowledge and strong industry expertise.

Teradata Take: Data analysts serve the critical purpose of helping to operationalize big data within specific functions and processes, with a clear focus on performance trends and operational information.

Back to the top

Data Architecture

“Data architecture is a set of rules, policies, standards and models that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its database systems. It provides a formal approach to creating and managing the flow of data and how it is processed across an organization’s IT systems and applications.” Source: Techopedia

Teradata Take: Teradata Unified Data Architecture is the first comprehensive big data architecture. This framework harnesses relational and non-relational repositories via SQL and non-SQL analytics. Consolidating data into data warehouses and data lakes enables enterprise-class architecture. Teradata’s unifies big data architecture through cross-platform data access for all analytic tools and the ability to “push-down” functions to the data, rather than moving data to the function. See data gravity.

Back to the top

Data Cleansing

Data cleansing, or data scrubbing, is the process of detecting and correcting or removing inaccurate data or records from a database. It may also involve correcting or removing improperly formatted or duplicate data or records. Such data removed in this process is often referred to as “dirty data.” Data cleansing is an essential task for preserving data quality. Large organizations with extensive data sets or assets typically use automated tools and algorithms to identity such records and correct common errors (such as missing zip codes in customer records).

Teradata take: The strongest Big Data environments have rigorous data cleansing tools and processes to ensure data quality is maintained at scale and confidence in data sets remains high for all types of users.

Back to the top

Data Gravity

Data gravity appears when the amount of data volume in a repository grows and the number of uses also grows. At some point, the ability to copy or migrate data becomes onerous and expensive. Thus, the data tends to pull services, applications and other data into its repository. Primary examples of data gravity are data warehouses and data lakes. Data in these systems have inertia. Scalable data volumes often break existing infrastructure and processes, which require risky and expensive remedies. Thus, the best practice design is to move processing to the data, not the other way around.

Teradata Take: Data gravity has affected terabyte- and petabyte-size data warehouses for many years. It is one reason scalable parallel processing of big data is required. This principle is now extending to data lakes which offer different use cases. Teradata helps clients manage data gravity.

Back to the top

Data Mining

“Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue. Data mining is also known as data discovery and knowledge discovery.” Source: Techopedia

Back to the top

Data Model / Data Modeling

“Data modeling is the analysis of data objects that are used in a business or other context and the identification of the relationships among these data objects. A data model can be thought of as a diagram or flowchart that illustrates the relationships between data.” Source: TechTarget

Teradata Take: Data models that are tailored to specific industries or business functions can provide a strong foundation or “jump-start” for big data programs and investments.

Back to the top

Data Warehouse

“In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc.) Source: Wikipedia

Back to the top

Descriptive Analytics

Considered the most basic type of analytics, descriptive analytics involves the breaking down of big data into smaller chunks of usable information so that companies can understand what happened with a specific operation, process or set of transactions. Descriptive analytics can provide insight into current customer behaviors and operational trends to support decisions about resource allocations, process improvements and overall performance management. Most industry observers believe it represents the vast majority of the analytics in use at companies today.

Teradata Take: A strong foundation of descriptive analytics – based on a solid and flexible data architecture – provides the accuracy and confidence in decision making most companies need in the big data era (especially if they wish to avoid being overwhelmed by large data volumes). More importantly, it ultimately enables more advanced analytics capabilities – especially predictive and prescriptive analytics.

Back to the top


Extract, Transform and Load (ETL) refers to the process in data warehousing that concurrently reads (or extracts) data from source systems; converts (or transforms) the data into the proper format for querying and analysis; and loads it into a data warehouse, operational data store or data mart). ETL systems commonly integrate data from multiple applications or systems that may be hosted on separate hardware and managed by different groups or users. ETL is commonly used to assemble a temporary subset of data for ad-hoc reporting, migrate data to new databases or convert database into a new format or type.

Back to the top


An extraordinarily large unit of digital data, one Exabyte (EB) is equal to 1,000 Petabytes or one billion gigabytes (GB). Some technologists have estimated that all the words ever spoken by mankind would be equal to five Exabytes.

Back to the top


Hadoop is a distributed data management platform or open-source software framework for storing and processing big data. It is sometimes described as a cut-down distributed operating system. It is designed to manage and work with immense volumes of data, and scale linearly to large clusters of thousands of commodity computers. It was originally developed for Yahoo!, but is now available free and publicly through Apache Software Foundation, though it usually requires extensive programming knowledge to be used.

Back to the top

Internet of Things (IOT):

A concept that describes the connection of everyday physical objects and products to the Internet so that they are recognizable by (through unique identifiers) and can relate to other devices. The term is closely identified with machine-to-machine communications and the development of, for example, “smart grids” for utilities, remote monitoring and other innovations. Gartner estimates 26 billion devices will be connected by 2020, including cars, coffee makers.

Teradata Take: Big data will only get bigger in the future and the IOT will be a major driver. The connectivity from wearables and sensors mean bigger volumes, more variety and higher-velocity feeds.

Back to the top

Machine Learning

“Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. It focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension – as is the case in data mining applications – machine learning uses that data to improve the program's own understanding. Machine learning programs detect patterns in data and adjust program actions accordingly.” Source: TechTarget

Teradata Take: Machine learning is especially powerful in a big data context in that machines can test hypotheses using large data volumes, refine business rules as conditions change and identify anomalies and outliers quickly and accurately.

Back to the top


“Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are very basic document metadata. In addition to document files, metadata is used for images, videos, spreadsheets and web pages.” Source: TechTarget

Teradata Take: The effective management of metadata is an essential part of solid and flexible big data “ecosystems” in that it helps companies more efficiently manage their data assets and make them available to data scientists and other analysts.

Back to the top


MongoDB is a cross-platform, open-source database that uses a document-oriented data model, rather than a traditional table-based relational database structure. This type of database structure is designed to make the integration of structured and unstructured data in certain types of applications easier and faster.

Back to the top

Natural Language Processing

A branch of artificial intelligence, natural language processing (NLP) deals with making human language (in both written and spoken forms) comprehensible to computers. As a scientific discipline, NLP involves tasks such as identifying sentence structures and boundaries in documents, detecting key words or phrases in audio recordings, extracting relationships between documents, and uncovering meaning in informal or slang speech patterns. NLP can make it possible to analyze and recognize patterns in verbal data that is currently unstructured.

Teradata Take: NLP holds a key for enabling major advancements in text analytics and for garnering deeper and potentially more powerful insights from social media data streams, where slang and unconventional language are prevalent.

Back to the top

Pattern Recognition

Pattern recognition occurs when an algorithm locates recurrences or regularities within large data sets or across disparate data sets. It is closely linked and even considered synonymous with machine learning and data mining. This visibility can help researchers discover insights or reach conclusions that would otherwise be obscured.

Back to the top


An extremely large unit of digital data, one Petabyte is equal to 1,000 Terabytes. Some estimates hold that a Petabyte is the equivalent of 20 million tall filing cabinets or 500 billion pages of standard printed text.

Back to the top

Predictive Analytics

Predictive analytics refers to the analysis of big data to make predictions and determine the likelihood of future outcomes, trends or events. In business, it can be used to model various scenarios for how customers react to new product offerings or promotions and how the supply chain might be affected by extreme weather patterns or demand spikes. Predictive analytics may involve various statistical techniques, such as modeling, machine learning and data mining.

Back to the top

Prescriptive Analytics

A type or extension of predictive analytics, prescriptive analytics is used to recommend or prescribe specific actions when certain information states are reached or conditions are met. It uses algorithms, mathematical techniques and/or business rules to choose among several different actions that are aligned to an objective (such as improving business performance) and that recognize various requirements or constraints.

Back to the top


R is an open-source programming language for statistical analysis. It includes a command line interface and several graphical interfaces. Popular algorithm types include linear and nonlinear modeling, time-series analysis, classification and clustering. According to Gartner research, more than 50% of data science teams now use R in some capacity. R language competes with commercial products such as SAS and Fuzzy Logix.

Teradata Take: Many R language algorithms yield inaccurate results when run in parallel. Teradata partnered with Revolution Analytics to convert many R algorithms to run correctly in parallel. Teradata Database runs R in-parallel via its scripting and language support feature. Teradata Aster R runs in-parallel as well. Both solutions eliminate open source R’s limitations around memory, processing and data.

Back to the top

Semi-structured Data

Semi-structured data refers to data that is not captured or formatted in conventional ways, such as those associated with a traditional database fields or common data models. It is also not raw or totally unstructured and may contain some data tables, tags or other structural elements. Graphs and tables, XML documents and email are examples of semi-structured data, which is very prevalent across the World Wide Web and is often found in object-oriented databases.

Teradata Take: As semi-structured data proliferates and because it contains some rational data, companies must account for it within their big data programs and data architectures.

Back to the top

Sentiment Analysis

Sentiment analysis involves the capture and tracking of opinions, emotions or feelings expressed by consumers in various types of interactions or documents, including social media, calls to customer service representatives, surveys and the like. Text analytics and natural language processing are typical activities within a process of sentiment analysis. The goal is to determine or assess the sentiments or attitudes expressed toward a company, product, service, person or event.

Teradata Take: Sentiment analysis is particularly important in tracking emerging trends or changes in perceptions on social media. Within big data environments, sentiment analysis combined with behavioral analytics and machine learning is likely to yield even more valuable insights.

Back to the top

Structured Data

Structured data refers to data sets with strong and consistent organization. Structured data is organized into rows and columns with known and predictable contents. Each column contains a specific data type, such as dates, text, money or percentages. Data not matching that column’s data type is rejected as an error. Relational database tables and spreadsheets typically contain structured data. A higher semantic level of structure combines master data and historical data into a data model. Data model subject areas include topics such as customers, inventory, sales transactions, prices and suppliers. Structured data is easy to use and data integrity can be enforced. Structured data becomes big data as huge amounts of historical facts are captured.

Teradata Take: All important business processes and decisions depend on structured data. It is the foundation of data warehouses, data lakes and applications. When integrated into a data model, structured data provides exponential business value.

Back to the top


A relatively large unit of digital data, one Terabyte (TB) equals 1,000 Gigabytes. It has been estimated that 10 Terabytes could hold the entire printed collection of the U.S. Library of Congress, while a single TB could hold 1,000 copies of the Encyclopedia Brittanica.

Back to the top

Unstructured Data

Unstructured data refers to unfiltered information with no fixed organizing principle. It is often called raw data. Common examples are web logs, XML, JSON, text documents, images, video, and audio files. Unstructured data is searched and parsed to extract useful facts. As much as 80% of enterprise data is unstructured. This means it is the most visible form of big data to many people. The size of unstructured data requires scalable analytics to produce insights. Unstructured data is found in most but not all data lakes because of the lower cost of storage.

Teradata Take: There is more noise than value in unstructured data. Extracting the value hidden in such files requires strong skills and tools. There is a myth that relational databases cannot process unstructured data. Teradata's Unified Data Architecture embraces unstructured data in several ways. Teradata Database and competitors can store and process XML, JSON, Avro and other forms of unstructured data.

Back to the top

The V’s:

Big data – and the business challenges and opportunities associated with it – are often discussed or described in the context of multiple V’s:

  • Value: the most important “V” from the perspective of the business, the value of big data usually comes from insight discovery and pattern recognition that lead to more effective operations, stronger customer relationships and other clear and quantifiable business benefits
  • Variability: the changing nature of the data companies seek to capture, manage and analyze – e.g., in sentiment or text analytics, changes in the meaning of key words or phrases
  • Variety: the diversity and range of different data types, including unstructured data, semi-structured data and raw data
  • Velocity: the speed at which companies receive, store and manage data – e.g., the specific number of social media posts or search queries received within a day, hour or other unit of time
  • Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence
  • Volume: the size and amounts of big data that companies manage and analyze

Back to the top