Difference between revisions of "Glossary of Data Analysis and Visualization Terms"

From Explore Analytics: The Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 +
 +
=====Aggregate Calculations======
 +
 +
In a [[#Pivot|pivot]] or [[#Data Visualization|chart]] data can be aggregated using the following types of calculations. Note that counts can apply to any type of fields, but most other calculations apply to numeric fields only.
 +
 +
* '''sum''' - field values are summarized. When data is shown in categories, it sums the data for each category.
 +
* '''count rows''' - a simple count of rows. When data is shown in categories, it counts the rows in the table that belong to each category.
 +
* '''count distinct values''' - a count of the number of distinct (unique) values of this field. When data is shown in categories, it counts the number of distinct values in each category. [[#NULL Value|NULL]] values are not counted.
 +
* '''count non-empty values''' - a count of the number of rows where the field has a value and the value is not blank.
 +
* '''average''' - field values are averaged.
 +
* '''max''' -- the highest numerical field value.
 +
* '''min''' -- the lowest numerical field value.
  
 
=====Category Chart=====
 
=====Category Chart=====
Line 23: Line 35:
  
 
A geographical chart is a [[#Data Visualization|visualization]] that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map.
 
A geographical chart is a [[#Data Visualization|visualization]] that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map.
 +
 +
===NULL Value===
 +
 +
A value of NULL means that the value is unknown or unspecified. In a databases the NULL value represents the absence of a value an is often treated differently than a specified blank (empty) value. When loading data from a [[#CSV File|CSV File]] it is sometimes impossible to tell whether the value is missing or simply blank. Therefore, in data analysis we often treat NULLs and blanks is being the same category.
  
 
=====Pivot=====
 
=====Pivot=====

Revision as of 07:43, 13 June 2012

Aggregate Calculations=

In a pivot or chart data can be aggregated using the following types of calculations. Note that counts can apply to any type of fields, but most other calculations apply to numeric fields only.

  • sum - field values are summarized. When data is shown in categories, it sums the data for each category.
  • count rows - a simple count of rows. When data is shown in categories, it counts the rows in the table that belong to each category.
  • count distinct values - a count of the number of distinct (unique) values of this field. When data is shown in categories, it counts the number of distinct values in each category. NULL values are not counted.
  • count non-empty values - a count of the number of rows where the field has a value and the value is not blank.
  • average - field values are averaged.
  • max -- the highest numerical field value.
  • min -- the lowest numerical field value.
Category Chart

A category chart is a visualization that specializes in breaking down data by category. It allows you to easily compare data and focus on the categories of interest that explain the drivers behind the data. Data is typically presented as bars or pie slices.

CSV File

A CSV File is a popular data file that uses the Comma Separated Values (CSV) format. It holds tabular data in plain text form. A CSV file consists of any number of records, separated by line breaks; each record consists of fields, separated by a separator character such as a comma or tab. All records have an identical sequence of fields. The first record can optionally have the names of the fields.

Many data providers and applications allow users to download their data in CSV format, and many analytical tools allow users to import data in this format.

Data Visualization

Visual representation of data that's designed for:

  • easy data comparisons
  • to reveal trends and changes over time
  • to discover correlation between different variables in the data
  • to discover patterns in the data

Good data visualization allows you to better understand the drivers behind the data and to make predictions based on that understanding. Common Data Visualizations include Timeline Chart, Category Chart, XY Chart, and Geographical Chart.

Geographical Chart

A geographical chart is a visualization that specializes in showing data by location, address, or geographical coordinates (longitude and latitude). It allows you to detect geographical patterns in your data and explain location the drivers behind the data. Data is typically presented using a geographical map.

NULL Value

A value of NULL means that the value is unknown or unspecified. In a databases the NULL value represents the absence of a value an is often treated differently than a specified blank (empty) value. When loading data from a CSV File it is sometimes impossible to tell whether the value is missing or simply blank. Therefore, in data analysis we often treat NULLs and blanks is being the same category.

Pivot

A Pivot is a tabular data presentation in which data is summarized by one or more categories. The labels for these categories are arranged across the top or down the side, and the table is populated with aggregate numerical calculations such as sums, averages, or counts that correspond to these categories. A pivot table makes it easy to see a high-level aggregate view and break it down by various categories to understand the drivers behind the data and make comparisons.

Timeline Chart

A timeline chart is a visualization that specializes in temporal data and is good for spotting trends. It shows data over time by putting the date/time in the horizontal axis and other variables on the vertical axis. A timeline chart offers various data presentations including lines and bars that can be shown on the same scale or different sales or in any combination to highlight changes and allows easy comparison. The main characteristic of a timeline chart is that time dimension is linear on the horizontal axis going from left to right and allowing scroll and zooming to focus on a particular time period.

XY Chart

An XY chart is a visualization that specializes in studying the relationship between numeric variables. Data is shown as a graph where points are drawn based on two variables in the data (two fields or calculations). These two variables are mapped to the X and Y axes respectively. A Bubble Chart is a specialized kind of XY chart in which a third variable is presented by varying the area of the points (bubbles) based on the value of the third variable. Category data can also be presented by varying the color or shape of the points (markers).