Data science and big data are vast fields, and this study guide provides a starting point. It's
important to continuously learn, explore, and keep up with advancements in the
field.
What Is
Data Science?
Data science is a multidisciplinary field of study that applies techniques and
tools to draw meaningful information and actionable insights out of noisy data.
Involving subjects like mathematics, statistics, computer science and
artificial intelligence, data science is used across a variety of industries
for smarter planning and decision making.
Data science is the realm of data scientists, who often rely on artificial
intelligence, especially its sub fields of machine learning and deep learning,
to create models and make predictions using algorithms and other techniques.
Data Science Definition: Basics of Data
Science
What Is Data Science Used for?
Data science is used by businesses of all kinds, from Fortune 50 companies to
fledgling startups, to look for connections and patterns and deliver
breakthrough insights. That explains why data science is a rapidly growing
field and revolutionizing many industries. More specifically, data science is used
for complex data analysis, predictive modeling, recommendation generation and
data visualization.
Analysis of Complex Data
Data science allows for quick and precise analysis. With various software tools
and techniques at their disposal, data analysts can easily identify trends and
detect patterns within even the largest and most complex datasets. This enables
businesses to make better decisions, whether it’s regarding how to best segment
customers or conducting a thorough market analysis.
Predictive Modeling
Data science can also be used for predictive modeling. In essence, by finding
patterns in data through the use of machine learning, analysts can forecast
possible future outcomes with some degree of accuracy. These models are
especially useful in industries like insurance, marketing, healthcare and
finance, where anticipating the likelihood of certain events happening is
central to the success of the business.
Recommendation Generation
Some companies, such as Netflix, Amazon and Spotify, rely on data science and
big data to generate recommendations for their users based on their past
behavior. It’s thanks to data science that users of these and similar platforms
can be served up content that is uniquely tailored to their preferences and
interests.
Data Visualization
Data science is also used to create data visualizations — think graphs, charts,
dashboards — and reporting, which helps non-technical business leaders and busy
executives easily understand otherwise complex information about the state of
their business.
Data Science Tools
Data science professionals typically require an arsenal of data science tools
and programming languages to use throughout their careers. These are some of
the more popular options being used today:
Common Data Science Programming Languages:
• Python
• R
• SQL
• C/C++
Popular Data Science Tools:
• Apache Spark (data analytics tool)
• Apache Hadoop (big data tool)
• KNIME (data analytics tool)
• Microsoft Excel (data analytics tool)
• Microsoft Power BI (business intelligence data analytics and data
visualization tool)
• MongoDB (database tool)
• Qlik (data analytics and data integration tool)
• QlikView (data visualization tool)
• SAS (data analytics tool)
• Scikit Learn (machine learning tool)
• Tableau (data visualization tool)
• TensorFlow (machine learning tool)
Data Science Lifecycle
Data science can be thought of as having a five-stage lifecycle:
Capture
This stage is when data scientists gather raw and unstructured data. The
capture stage typically includes data acquisition, data entry, signal reception
and data extraction.
Maintain
This stage is when data is put into a form that can be utilized. The
maintenance stage includes data warehousing, data cleansing, data staging, data
processing and data architecture.
Process
This stage is when data is examined for patterns and biases to see how it will
work as a predictive analysis tool. The process stage includes data mining,
clustering and classification, data modeling and data summarization.
Analyze
This stage is when multiple types of analyses are performed on the data. The
analysis stage involves data reporting, data visualization, business
intelligence and decision making.
Communicate
This stage is when data scientists and analysts showcase the data through
reports, charts and graphs. The communication stage typically includes
exploratory and confirmatory analysis, predictive analysis, regression, text
mining and qualitative analysis.
What Are Data Science Techniques?
There are lots of data science techniques with which data science professionals
must be familiar in order to do their jobs. These are some of the most popular
techniques:
Regression
A type of supervised learning, regression analysis in data science allows you
to predict an outcome based on multiple variables and how those variables
affect each other. Linear regression is the most commonly used regression
analysis technique.
Classification
Classification in data science refers to the process of predicting the category
or label of different data points. Like regression, classification is a
subcategory of supervised learning. It’s used for applications such as email
spam filters and sentiment analysis.
Clustering
Clustering, or cluster analysis, is a data science technique used in
unsupervised learning. During cluster analysis, closely associated objects
within a data set are grouped together, and then each group is assigned
characteristics. Clustering is done to reveal patterns within data — typically
with large, unstructured data sets.
Anomaly Detection
Anomaly detection, sometimes called outlier detection, is a data science
technique in which data points with relatively extreme values are identified.
Anomaly detection is used in industries like finance and cybersecurity.
What is
Big Data Analytics?
Big Data analytics is a process used to extract meaningful
insights, such as hidden patterns, unknown correlations, market trends, and
customer preferences. Big Data analytics provides various advantages—it can be
used for better decision making, preventing fraudulent activities, among other
things.
Why is
big data analytics important?
In today’s world, Big Data analytics is fueling everything
we do online—in every industry.
Take the music streaming platform Spotify for example. The
company has nearly 96 million users that generate a tremendous amount of data
every day. Through this information, the cloud-based platform automatically
generates suggested songs—through a smart recommendation engine—based on likes,
shares, search history, and more. What enables this is the techniques, tools,
and frameworks that are a result of Big Data analytics.
What is
Big Data?
Big Data is a massive amount of data sets that cannot be
stored, processed, or analyzed using traditional tools.
Today, there are millions of data sources that generate data
at a very rapid rate. These data sources are present across the world. Some of
the largest sources of data are social media platforms and networks. Let’s use
Facebook as an example—it generates more than 500 terabytes of data every day.
This data includes pictures, videos, messages, and more.
Data also exists in different formats, like structured data,
semi-structured data, and unstructured data. For example, in a regular Excel
sheet, data is classified as structured data—with a definite format. In
contrast, emails fall under semi-structured, and your pictures and videos fall
under unstructured data. All this data combined makes up Big Data.
Uses and
Examples of Big Data Analytics
There are many ways that Big Data analytics can be used to
improve businesses and organizations. Here are some examples:
• Using analytics to understand
customer behavior to optimize the customer experience
• Predicting
future trends to make better business decisions
• Improving
marketing campaigns by understanding what works and what doesn't
• Increasing operational efficiency
by understanding where bottlenecks are and how to fix them
• Detecting
fraud and other forms of misuse sooner
These are just a few examples — the possibilities are endless
when it comes to Big Data analytics. It all depends on how you want to use it to
improve your business.
History
of Big Data Analytics
The history of Big Data analytics can be traced back to the
early days of computing, when organizations first began using computers to
store and analyses large amounts of data. However, it was not until the late
1990s and early 2000s that Big Data analytics really began to take off, as
organizations increasingly turned to computers to help them make sense of the
rapidly growing volumes of data being generated by their businesses.
Today, Big Data analytics has become an essential tool for
organizations of all sizes across a wide range of industries. By harnessing the
power of Big Data, organizations can gain insights into their customers, their
businesses, and the world around them that were simply not possible before.
As the field of Big Data analytics continues to evolve, we can
expect to see even more amazing and transformative applications of this
technology in the years to come.
Benefits
and Advantages of Big Data Analytics
1. Risk Management
Use Case: Banco de Oro, a Philippine banking company, uses
Big Data analytics to identify fraudulent activities and discrepancies. The
organization leverages it to narrow down a list of suspects or root causes of
problems.
2. Product Development and Innovations
Use Case: Rolls-Royce, one of the largest manufacturers of
jet engines for airlines and armed forces across the globe, uses Big Data
analytics to analyse how efficient the engine designs are and if there is any
need for improvements.
3. Quicker and Better Decision Making Within
Organizations
Use Case: Starbucks uses Big Data analytics to make
strategic decisions. For example, the company leverages it to decide if a
particular location would be suitable for a new outlet or not. They will analyse
several different factors, such as population, demographics, accessibility of
the location, and more.
4. Improve Customer Experience
Use Case: Delta Air Lines uses Big Data analysis to improve
customer experiences. They monitor tweets to find out their customers’
experience regarding their journeys, delays, and so on. The airline identifies
negative tweets and does what’s necessary to remedy the situation. By publicly
addressing these issues and offering solutions, it helps the airline build good
customer relations.
The
Lifecycle Phases of Big Data Analytics
Now, let’s review how Big Data analytics works:
• Stage 1
- Business case evaluation - The Big Data analytics lifecycle begins
with a business case, which defines the reason and goal behind the analysis.
• Stage 2
- Identification of data - Here, a broad variety of data sources are
identified.
• Stage 3
- Data filtering - All the identified data from the previous stage is
filtered here to remove corrupt data.
• Stage 4
- Data extraction - Data that is not compatible with the tool is
extracted and then transformed into a compatible form.
• Stage 5
- Data aggregation - In this stage, data with the same fields across
different datasets are integrated.
• Stage 6
- Data analysis - Data is evaluated using analytical and statistical
tools to discover useful information.
• Stage 7
- Visualization of data - With tools like Tableau, Power BI, and
QlikView, Big Data analysts can produce graphic visualizations of the analysis.
• Stage 8
- Final analysis result - This is the last step of the Big Data
analytics lifecycle, where the results of the analysis are made available to
business stakeholders who will take action.
Different
Types of Big Data Analytics
Here are the four types of Big Data analytics:
1. Descriptive Analytics
This summarizes past data into a form that people can easily
read. This helps in creating reports, like a company’s revenue, profit, sales,
and so on. Also, it helps in the tabulation of social media metrics.
Use Case: The Dow Chemical Company analysed its past data to
increase facility utilization across its office and lab space. Using
descriptive analytics, Dow was able to identify underutilized space. This space
consolidation helped the company save nearly US $4 million annually.
2. Diagnostic Analytics
This is done to understand what caused a problem in the
first place. Techniques like drill-down, data mining, and data recovery are all
examples. Organizations use diagnostic analytics because they provide an
in-depth insight into a particular problem.
Use Case: An e-commerce company’s report shows that their
sales have gone down, although customers are adding products to their carts.
This can be due to various reasons like the form didn’t load correctly, the
shipping fee is too high, or there are not enough payment options available.
This is where you can use diagnostic analytics to find the reason.
3. Predictive Analytics
This type of analytics investigates the historical and
present data to make predictions of the future. Predictive analytics uses data
mining, AI, and machine learning to analyse current data and make predictions
about the future. It works on predicting customer trends, market trends, and so
on.
Use Case: PayPal determines what kind of precautions they must
take to protect their clients against fraudulent transactions. Using predictive
analytics, the company uses all the historical payment data and user behavior
data and builds an algorithm that predicts fraudulent activities.
4. Prescriptive Analytics
This type of analytics prescribes the solution to a
particular problem. Perspective analytics works with both descriptive and
predictive analytics. Most of the time, it relies on AI and machine learning.
Use Case: Prescriptive analytics can be used to maximize an
airline’s profit. This type of analytics is used to build an algorithm that
will automatically adjust the flight fares based on numerous factors, including
customer demand, weather, destination, holiday seasons, and oil prices.
Big Data Analytics Tools
Here are some of the key big data analytics tools:
• Hadoop
- helps in storing and analysing data
• MongoDB
- used on datasets that change frequently
• Talend
- used for data integration and management
• Cassandra
- a distributed database used to handle chunks of data
• Spark
- used for real-time processing and analysing large amounts of data
• STORM
- an open-source real-time computational system
• Kafka
- a distributed streaming platform that is used for fault-tolerant storage
Big Data Industry Applications
Here are some of the sectors where Big Data is actively
used:
• E commerce
- Predicting customer trends and optimizing prices are a few of the ways
e-commerce uses Big Data analytics
• Marketing
- Big Data analytics helps to drive high ROI marketing campaigns, which result
in improved sales
• Education
- Used to develop new and improve existing courses based on market requirements
• Healthcare
- With the help of a patient’s medical history, Big Data analytics is used to
predict how likely they are to have health issues
• Media
and entertainment - Used to understand the demand of shows, movies, songs, and
more to deliver a personalized recommendation list to its users
• Banking
- Customer income and spending patterns help to predict the likelihood of
choosing various banking offers, like loans and credit cards
• Telecommunications
- Used to forecast network capacity and improve customer experience
• Government
- Big Data analytics helps governments in law enforcement, among other things
Thank you for reading. Happy Learning !