Difference between revisions of "Glossary of Data Analysis and Visualization Terms"
Gadiyedwab (talk | contribs) |
Gadiyedwab (talk | contribs) |
||
(13 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ==Aggregate Calculations== | |
− | In a [[#Pivot|pivot]] or [[#Data Visualization|chart]] data can be aggregated using the following types of calculations. Note that counts can apply to any type of | + | In a [[#Pivot|pivot]] or [[#Data Visualization|chart]] data can be aggregated using the following types of calculations. Note that counts can apply to any type of field, but most other calculations apply to numeric fields only. |
* '''sum''' - field values are summarized. When data is shown in categories, it sums the data for each category. | * '''sum''' - field values are summarized. When data is shown in categories, it sums the data for each category. | ||
Line 12: | Line 12: | ||
* '''min''' -- the lowest numerical field value. | * '''min''' -- the lowest numerical field value. | ||
− | =====Category Chart | + | ==Candlestick Chart== |
+ | |||
+ | A candlestick chart is a style of bar-chart used primarily to describe stock price movements over time with each bar representing the range of price movement over a given time interval. | ||
+ | |||
+ | Candlesticks usually consist of the body, and an upper and a lower wick: the area between the open and the close is called the body, price excursions above and below the body form the upper and lower wick. The wick illustrates the highest and lowest prices during the time interval. The body illustrates the opening and closing trades. If the stock closed higher than it opened, the body is white or unfilled, with the opening price at the bottom of the body and the closing price at the top. If the security closed lower than it opened, the body is black, with the opening price at the top and the closing price at the bottom. | ||
+ | |||
+ | ==Category Chart== | ||
A category chart is a [[#Data Visualization|visualization]] that specializes in breaking down data by category. It allows you to easily compare data and focus on the categories of interest that explain the drivers behind the data. Data is typically presented as bars or pie slices. | A category chart is a [[#Data Visualization|visualization]] that specializes in breaking down data by category. It allows you to easily compare data and focus on the categories of interest that explain the drivers behind the data. Data is typically presented as bars or pie slices. | ||
− | + | ==CSV File== | |
A CSV File is a popular data file that uses the Comma Separated Values (CSV) format. It holds tabular data in plain text form. A CSV file consists of any number of records, separated by line breaks; each record consists of fields, separated by a separator character such as a comma or tab. All records have an identical sequence of fields. The first record can optionally have the names of the fields. | A CSV File is a popular data file that uses the Comma Separated Values (CSV) format. It holds tabular data in plain text form. A CSV file consists of any number of records, separated by line breaks; each record consists of fields, separated by a separator character such as a comma or tab. All records have an identical sequence of fields. The first record can optionally have the names of the fields. | ||
Line 22: | Line 28: | ||
Many data providers and applications allow users to download their data in CSV format, and many analytical tools allow users to import data in this format. | Many data providers and applications allow users to download their data in CSV format, and many analytical tools allow users to import data in this format. | ||
− | =====Data Visualization | + | ==Data Source== |
+ | |||
+ | A data source is a database or application that holds a set of tables with data. In Explore Analytics, you can define data sources that correspond to data sources inside your organization or anywhere on the internet. | ||
+ | |||
+ | ==Data Visualization== | ||
Visual representation of data that's designed for: | Visual representation of data that's designed for: | ||
Line 32: | Line 42: | ||
Good data visualization allows you to better understand the drivers behind the data and to make predictions based on that understanding. Common Data Visualizations include [[#Timeline Chart|Timeline Chart]], [[#Category Chart|Category Chart]], [[#XY Chart|XY Chart]], and [[#Geographical Chart|Geographical Chart]]. | Good data visualization allows you to better understand the drivers behind the data and to make predictions based on that understanding. Common Data Visualizations include [[#Timeline Chart|Timeline Chart]], [[#Category Chart|Category Chart]], [[#XY Chart|XY Chart]], and [[#Geographical Chart|Geographical Chart]]. | ||
− | + | ==Geographical Chart== | |
A geographical chart is a [[#Data Visualization|visualization]] that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map. | A geographical chart is a [[#Data Visualization|visualization]] that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map. | ||
− | ===NULL Value=== | + | ==List== |
+ | |||
+ | A ''list'' is a tabular presentation of the data. In a list, each column corresponds to a [[#Table|table]] field and each row corresponds to a table row (record). | ||
+ | |||
+ | ==LOESS== | ||
+ | |||
+ | LOESS, or Locally Weighted Scatterplot Smoothing is a method of drawing a regression trend line in a scatterplot. As the name suggests, it uses the surrounding points to calculate a Y value for every X value and thus draw a line. You may think of it as a weighted moving average similar to a [[#Moving Average|moving average]]. | ||
+ | |||
+ | ==Moving Average== | ||
+ | |||
+ | A moving average is a method of drawing a smooth trend line by calculating a Y value for every X value by averaging the Y values of the ''n'' points leading to and including the X value. For example, in a timeline, a 50-day moving average, calculates the average value of the trailing 50 points (days) and uses that to draw the trend line. | ||
+ | |||
+ | ==NULL Value== | ||
+ | |||
+ | A value of NULL means that the value is unknown or unspecified. In a databases the NULL value represents the absence of a value is often treated differently than a specified blank (empty) value. When loading data from a [[#CSV File|CSV File]] it is sometimes impossible to tell whether the value is missing or simply blank. Therefore, in data analysis we often treat NULLs and blanks as being the same category. | ||
+ | |||
+ | ==OHLC Chart== | ||
− | + | An open-high-low-close chart (OHLC chart) is typically used to illustrate movements in the price of a financial instrument such as stock over time. Each vertical line on the chart shows the price range (the highest and lowest prices) over one unit of time, e.g., one day or one hour. Tick marks project from each side of the line indicating the opening price on the left, and the closing price for that time period on the right. The bars are shown in green if the stock closed higher, in red if it closed lower, and in gray if unchanged. | |
− | + | ==Pivot== | |
A Pivot is a tabular data presentation in which data is summarized by one or more categories. The labels for these categories are arranged across the top or down the side, and the table is populated with aggregate numerical calculations such as sums, averages, or counts that correspond to these categories. A pivot table makes it easy to see a high-level aggregate view and break it down by various categories to understand the drivers behind the data and make comparisons. | A Pivot is a tabular data presentation in which data is summarized by one or more categories. The labels for these categories are arranged across the top or down the side, and the table is populated with aggregate numerical calculations such as sums, averages, or counts that correspond to these categories. A pivot table makes it easy to see a high-level aggregate view and break it down by various categories to understand the drivers behind the data and make comparisons. | ||
− | =====Timeline Chart | + | ==Table== |
+ | |||
+ | A ''table'' refers to a data table in a [[#Data Source|data source]]. This normally corresponds to a database table or an equivalent application concept. | ||
+ | |||
+ | ==Timeline Chart== | ||
A timeline chart is a [[#Data Visualization|visualization]] that specializes in temporal data and is good for spotting trends. It shows data over time by putting the date/time in the horizontal axis and other variables on the vertical axis. A timeline chart offers various data presentations including lines and bars that can be shown on the same scale or different sales or in any combination to highlight changes and allows easy comparison. The main characteristic of a timeline chart is that time dimension is linear on the horizontal axis going from left to right and allowing scroll and zooming to focus on a particular time period. | A timeline chart is a [[#Data Visualization|visualization]] that specializes in temporal data and is good for spotting trends. It shows data over time by putting the date/time in the horizontal axis and other variables on the vertical axis. A timeline chart offers various data presentations including lines and bars that can be shown on the same scale or different sales or in any combination to highlight changes and allows easy comparison. The main characteristic of a timeline chart is that time dimension is linear on the horizontal axis going from left to right and allowing scroll and zooming to focus on a particular time period. | ||
− | + | ==XY Chart== | |
An XY chart is a [[#Data Visualization|visualization]] that specializes in studying the relationship between numeric variables. Data is shown as a graph where points are drawn based on two variables in the data (two fields or calculations). These two variables are mapped to the X and Y axes respectively. A Bubble Chart is a specialized kind of XY chart in which a third variable is presented by varying the area of the points (bubbles) based on the value of the third variable. Category data can also be presented by varying the color or shape of the points (markers). | An XY chart is a [[#Data Visualization|visualization]] that specializes in studying the relationship between numeric variables. Data is shown as a graph where points are drawn based on two variables in the data (two fields or calculations). These two variables are mapped to the X and Y axes respectively. A Bubble Chart is a specialized kind of XY chart in which a third variable is presented by varying the area of the points (bubbles) based on the value of the third variable. Category data can also be presented by varying the color or shape of the points (markers). | ||
+ | |||
+ | {{Template:TOC|Report Development Life Cycle|Glossary of Explore Analytics Terms}} |
Latest revision as of 19:26, 20 December 2013
Contents
Aggregate Calculations
In a pivot or chart data can be aggregated using the following types of calculations. Note that counts can apply to any type of field, but most other calculations apply to numeric fields only.
- sum - field values are summarized. When data is shown in categories, it sums the data for each category.
- count rows - a simple count of rows. When data is shown in categories, it counts the rows in the table that belong to each category.
- count distinct values - a count of the number of distinct (unique) values of this field. When data is shown in categories, it counts the number of distinct values in each category. NULL values are not counted.
- count non-empty values - a count of the number of rows where the field has a value and the value is not blank.
- average - field values are averaged.
- max -- the highest numerical field value.
- min -- the lowest numerical field value.
Candlestick Chart
A candlestick chart is a style of bar-chart used primarily to describe stock price movements over time with each bar representing the range of price movement over a given time interval.
Candlesticks usually consist of the body, and an upper and a lower wick: the area between the open and the close is called the body, price excursions above and below the body form the upper and lower wick. The wick illustrates the highest and lowest prices during the time interval. The body illustrates the opening and closing trades. If the stock closed higher than it opened, the body is white or unfilled, with the opening price at the bottom of the body and the closing price at the top. If the security closed lower than it opened, the body is black, with the opening price at the top and the closing price at the bottom.
Category Chart
A category chart is a visualization that specializes in breaking down data by category. It allows you to easily compare data and focus on the categories of interest that explain the drivers behind the data. Data is typically presented as bars or pie slices.
CSV File
A CSV File is a popular data file that uses the Comma Separated Values (CSV) format. It holds tabular data in plain text form. A CSV file consists of any number of records, separated by line breaks; each record consists of fields, separated by a separator character such as a comma or tab. All records have an identical sequence of fields. The first record can optionally have the names of the fields.
Many data providers and applications allow users to download their data in CSV format, and many analytical tools allow users to import data in this format.
Data Source
A data source is a database or application that holds a set of tables with data. In Explore Analytics, you can define data sources that correspond to data sources inside your organization or anywhere on the internet.
Data Visualization
Visual representation of data that's designed for:
- easy data comparisons
- to reveal trends and changes over time
- to discover correlation between different variables in the data
- to discover patterns in the data
Good data visualization allows you to better understand the drivers behind the data and to make predictions based on that understanding. Common Data Visualizations include Timeline Chart, Category Chart, XY Chart, and Geographical Chart.
Geographical Chart
A geographical chart is a visualization that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map.
List
A list is a tabular presentation of the data. In a list, each column corresponds to a table field and each row corresponds to a table row (record).
LOESS
LOESS, or Locally Weighted Scatterplot Smoothing is a method of drawing a regression trend line in a scatterplot. As the name suggests, it uses the surrounding points to calculate a Y value for every X value and thus draw a line. You may think of it as a weighted moving average similar to a moving average.
Moving Average
A moving average is a method of drawing a smooth trend line by calculating a Y value for every X value by averaging the Y values of the n points leading to and including the X value. For example, in a timeline, a 50-day moving average, calculates the average value of the trailing 50 points (days) and uses that to draw the trend line.
NULL Value
A value of NULL means that the value is unknown or unspecified. In a databases the NULL value represents the absence of a value is often treated differently than a specified blank (empty) value. When loading data from a CSV File it is sometimes impossible to tell whether the value is missing or simply blank. Therefore, in data analysis we often treat NULLs and blanks as being the same category.
OHLC Chart
An open-high-low-close chart (OHLC chart) is typically used to illustrate movements in the price of a financial instrument such as stock over time. Each vertical line on the chart shows the price range (the highest and lowest prices) over one unit of time, e.g., one day or one hour. Tick marks project from each side of the line indicating the opening price on the left, and the closing price for that time period on the right. The bars are shown in green if the stock closed higher, in red if it closed lower, and in gray if unchanged.
Pivot
A Pivot is a tabular data presentation in which data is summarized by one or more categories. The labels for these categories are arranged across the top or down the side, and the table is populated with aggregate numerical calculations such as sums, averages, or counts that correspond to these categories. A pivot table makes it easy to see a high-level aggregate view and break it down by various categories to understand the drivers behind the data and make comparisons.
Table
A table refers to a data table in a data source. This normally corresponds to a database table or an equivalent application concept.
Timeline Chart
A timeline chart is a visualization that specializes in temporal data and is good for spotting trends. It shows data over time by putting the date/time in the horizontal axis and other variables on the vertical axis. A timeline chart offers various data presentations including lines and bars that can be shown on the same scale or different sales or in any combination to highlight changes and allows easy comparison. The main characteristic of a timeline chart is that time dimension is linear on the horizontal axis going from left to right and allowing scroll and zooming to focus on a particular time period.
XY Chart
An XY chart is a visualization that specializes in studying the relationship between numeric variables. Data is shown as a graph where points are drawn based on two variables in the data (two fields or calculations). These two variables are mapped to the X and Y axes respectively. A Bubble Chart is a specialized kind of XY chart in which a third variable is presented by varying the area of the points (bubbles) based on the value of the third variable. Category data can also be presented by varying the color or shape of the points (markers).