Types of Data Analysis: A Guide



Data analysis is an aspect of data science it is about analyzing data for different purposes. It involves inspecting, cleaning, transforming and modeling data to derive useful insights.

What are the different types of data analysis?

  1. Descriptive analysis
  2. Exploratory analysis
  3. Inferential analysis
  4. Predictive analysis
  5. Causal analysis
  6. Mechanistic analysis

With its many facets, methodologies, and techniques, data analytics is used in a variety of fields including – business, science, and social science, among others. As businesses thrive under the influence of numerous technological advancements, data analytics plays a huge role in decision makingproviding a better, faster and more efficient system that minimizes risk and reduces human bias.

That said, there are different types of analysis serving different purposes. We will examine each below.

Two data analysis camps

Data analysis can be divided into two camps, according to the book R for Data Science:

  1. Generation of hypothesesIt involves looking deep into the data and combining your domain knowledge to generate hypotheses about why the data behaves the way it does.
  2. Confirmation of the hypothesisThat implies using an accurate mathematical model to generate falsifiable predictions with statistical sophistication to confirm your prior assumptions.

Types of data analysis

Data analysis can be separated and organized into six types, ranked in increasing order of complexity.

  1. Descriptive analysis
  2. Exploratory analysis
  3. Inferential analysis
  4. Predictive analysis
  5. Causal analysis
  6. Mechanistic analysis

1. Descriptive analysis

The objective of description the analysis is describe or summarize a set of data. Here’s what you need to know:

  • Descriptive analysis is the very first analysis performed.
  • It generates simple summaries of samples and measurements.
  • These are municipalities, descriptive statistics such as measures of central tendency, variability, frequency and position.

Example of descriptive analysis

Take the COVID-19 statistics page on Google for example. The line chart is a pure summary of cases/deaths, a presentation and description of the population of a particular country infected with the virus.

Descriptive analysis is the first step of analysis where you summarize and describe the data you have using descriptive statistics, and the result is a simple presentation of your data.

Learn more about data analysis: Data Scientist vs Data Analyst: Similarities and Differences Explained

2. Exploratory Analysis (EDA)

Exploratory analysis involves examining or exploring data and finding relationships between variables that were previously unknown. Here’s what you need to know:

  • EDA helps you discover relationships between measurements in your data, which are not proof that correlation exists, as the phrase “Correlation does not imply causation” suggests.
  • It is useful for discovering new connections and formulating hypotheses. It drives design planning and data collection.

Example of exploratory analysis

Climate change is an increasingly important topic as the global temperature is gradually increasing over the years. An example of exploratory analysis of climate change data is to take the increase in temperature over the years from 1950 to 2020 and the increase in human activities and industrialization to find relationships from the data. For example, you can increase the number of factories, cars on the road, and airplane flights to see how this correlates with increasing temperature.

Exploratory analysis explores the data to find relationships between measurements without identifying the cause. This is very useful when formulating hypotheses.

3. Inferential analysis

Inferential analysis involves using a small sample of data to infer information about a larger population of data.

The objective of statistical modeling itself involves using a small amount of information to extrapolate and generalize the information to a larger group. Here’s what you need to know:

  • Inferential analysis involves using estimated data representative of a population and giving a measure of uncertainty or standard deviation to your estimate.
  • The precision inference strongly depends on your sampling scheme. If the sample is not representative of the population, the generalization will be inaccurate. This is known as the central limit theorem.

Example of inferential analysis

The idea of ​​drawing an inference about the population as a whole with a smaller sample size is intuitive. Many statistics you see in the media and on the internet are inferential; a prediction of an event based on a small sample. For example, a psychological study on the benefits of sleep might involve a total of 500 people. When they tracked the contestants, the contestants reported having better overall attention spans and better well-being with seven to nine hours of sleep, while those who slept less and more than the given range suffered from a reduced attention span and energy. . This study of 500 people was only a tiny fraction of the 7 billion people in the world, and is therefore a broader population inference.

Inferential analysis extrapolates and generalizes information from the larger group with a smaller sample to generate analyzes and predictions.

4. Predictive analysis

Predictive analytics involves use historical or current data to find patterns and make predictions about the future. Here’s what you need to know:

  • The accuracy of the predictions depends on the input variables.
  • The accuracy also depends on the types of models. A linear model may work well in some cases, and in other cases it may not.
  • Using one variable to predict another does not denote a causal relationship.

Example of predictive analysis

The 2020 US election is a popular topic and many prediction models are constructed to predict the winning candidate. FiveThirtyEight did this to predict the 2016 and 2020 elections. Prediction analysis for an election would require input variables such as historical polling data, trends, and current polling data in order to return a good prediction . Something as important as an election would not just use a linear model, but a complex model with some tweaks to better serve its purpose.

Predictive analytics takes data from the past and present to make predictions about the future.

Learn more about data: Explain the empirical for the normal distribution

5. Causal analysis

Causal analysis look at cause and effect relationships between variables and focuses on finding the cause of a correlation. Here’s what you need to know:

  • To find the cause, you need to ask yourself if the observed correlations leading to your conclusion are valid. Just looking at the surface data won’t help you uncover the hidden mechanisms underlying the correlations.
  • Causal analysis is applied in randomized studies focusing on the identification of causality.
  • Causal analysis is the gold standard in data analysis and scientific studies where the cause of the phenomenon must be extracted and distinguished, like separating the wheat from the chaff.
  • Good data is hard to find and requires expensive research and study. These studies are analyzed in an aggregated way (multiple groups), and the relationships observed are only average effects (mean) of the entire population. This means that the results may not apply to everyone.

Example of Causal Analysis

Suppose you want to test whether a new drug improves human strength and concentration. To do this, you perform randomized controlled trials for the drug to test its effect. You compare the sample of candidates for your new drug to the candidates receiving a dummy control drug through a few tests focusing on strength and overall concentration and attention. This will allow you to observe how the drug affects the result.

Causal analysis involves discovering the causal relationship between variables and examining how a change in one variable affects another.

6. Mechanistic analysis

Mechanistic analysis is used to understand exactly changes in variables that cause other changes in other variables. Here’s what you need to know:

  • It is applied in physical science or engineering, situations that require high accuracy and little margin of error, only noise in the data is measurement error.
  • It is designed to understand a biological or behavioral process, the pathophysiology of a disease or the mechanism of action of an intervention.

Mechanistic analysis Example

Lots of university-level research and complex topics are apt examples, but to put it in simple terms, let’s say an experiment is performed to simulate safe and efficient nuclear fusion to power the world. A mechanistic analysis of the study would involve a fine balance between control and manipulation of variables with very precise measurements of variables and desired outcomes. It is this complex and meticulous modus operandi towards these great subjects that enables scientific breakthroughs and the advancement of society.

Mechanistic analysis is somewhat of a predictive analysis, but modified to tackle studies that require high precision and meticulous methodologies for the physical sciences or engineering.

A tutorial on the different types of data analysis. | Video: Shiram Vasudevan

When to use the different types of data analysis

  • Descriptive analysis summarizes the available data and presents your data in an understandable way.
  • Exploratory data analysis helps you discover correlations and relationships between variables in your data.
  • Inferential analysis is to generalize the larger population with a smaller data sample.
  • Predictive analysis helps you make predictions about the future with data.
  • Causal analysis focuses on finding the cause of a correlation between variables.
  • Mechanistic analysis is to measure the exact changes in variables that lead to other changes in other variables.

Here are some important tips to remember:

  • Correlation does not imply causation.
  • EDA helps to discover new connections and formulate hypotheses.
  • The accuracy of the inference depends on the sampling scheme.
  • A good prediction depends on the right input variables.
  • A simple linear model with enough data usually does the trick.
  • Using one variable to predict another does not denote causal relationships.
  • Good data is hard to find and producing it requires expensive research.
  • Study results are done in aggregate and are average effects and may not apply to everyone.

Source link


Comments are closed.