Notes

9th Computer Unit 4

9th Computer Unit 4 Data and Analysis

Unit 4: Data and Analysis

Give short answers to the following short response questions (SRQs).

Q.1) Define data analytics and data science. Are they similar or different? Give reason.

Ans: Data Analytics

Data Analytics refers to the process of carefully examining and studying data to identify patterns, draw conclusions, or make the data meaningful.

Data Science:

Data Science refers to an interdisciplinary field of multiple disciplines that uses mathematics, statistics, data analysis, and machine learning to analyze data and extract knowledge and insights from it.

Data analytics and data science are related but distinct fields:

Similarities:

i. Both involve analyzing data to extract insights and inform decision-making

ii. Both fields utilize statistical and mathematical techniques to uncover patterns and trends within datasets.

Differences:

i. Data analytics typically focuses on analyzing existing datasets to answer specific questions. Data science encompasses a broader range of activities, including data collection, preprocessing, etc.

ii. Data analytics often involves descriptive and diagnostic analytics, aiming to understand past events. Data science goes beyond this to include predictive and prescriptive analytics, forecasting future trends and making recommendations.

Q.2) Can you relate how data science is helpful in solving business problems?

Ans: Yes, Data science helps businesses by analyzing vast amounts of data to uncover insights, trends, and patterns. It enables informed decision-making, enhances customer experiences, identifies improved operational efficiency, and opportunities, and predicts outcomes.

Q.3) Database is useful in the field of data science. Defend this statement.

Ans: Databases serve as the foundation for large volumes of structured storing, managing, and accessing unstructured data, which is essential for data science tasks such as analysis, modelling, and machine learning.

Q.4) Compare machine learning and deep learning, in the context of formal & informal education.

Ans: Comparison of machine learning and deep learning:

Formal Education:

Machine Learning:

  • It provides students with foundational knowledge and skills in data analysis.
  • Advanced courses and degree programs focus on machine learning techniques, algorithms, and applications:
  • In academic settings, machine learning research contributes to the advancement of knowledge and technology.

Deep Learning:

  • Deep learning is covered in advanced courses at the graduate level due to its complexity and prerequisites in machine learning.
  • Formal education provides opportunities for students to engage in deep learning research projects under the guidance of faculty members.

Informal Education:

Machine Learning:

  • Online courses are provided to individuals seeking to acquire practical skills in data analysis, machine learning algorithms, and model deployment.
  • Self-Study Resources:
  • Informal learners can access a wealth of online resources, including textbooks, blogs, and video tutorials, to deepen their understanding of machine learning concepts and techniques.

Deep Learning:

  • Specialized courses on deep learning are available on online platforms
  • Informal learners can participate in online forums, discussion groups, and social media communities dedicated to deep learning.

Q.5) What is meant by sources of data? Give three sources of data excluding those mentioned in the book.

Ans: Sources of data refer to the various origins or channels from which data is collected or obtained for analysis.

Here are three sources of data:

i. Transaction Records:

Data generated from transactional activities, such as purchases, sales, and financial transactions, are captured in databases or records systems.

ii. Social Media:

Social media data offers insights into consumer preferences, sentiment analysis, and brand perception, which can inform marketing strategies and customer engagement efforts.

iii. Government Databases:

Government databases provide valuable data for research, policy analysis and decision-making in various sectors such as healthcare, education, and public administration.

Q.6) Differentiate between database and dataset.

Ans:  Difference between Database and Dataset:

DatabaseDataset
A structured collection of data organized in a way that allows for efficient storage, retrieval, and management.A structured or unstructured collection of data, often organized into rows and columns or files, used for analysis, research, or machine learning tasks.
Designed to handle ongoing updates, inserts, and deletions of dataCan be static, with fixed data, or dynamic, with updates
Designed to store and manage large volumes of data.Used for analysis, research, or training machine learning models,

Q.7) Argue about the trends, outliers, and distribution of values in a data set. Describe.

Ans:

Trends:

Trends in a dataset refer to the general direction in which the data is moving over time or across different variables. Identifying trends helps in understanding patterns and making predictions.

Outliers:

Outliers are data points that significantly deviate from the rest of the dataset. They can skew statistical analyses and distort interpretations if not handled properly. Outliers may represent rare events.

Distribution of Values:

The distribution of values in a dataset refers to how the data is spread or arranged across different values or categories. Common distributions include normal, uniform, skewed, or multimodal distributions.

Q.8) Why are summary statistics needed?

Ans: It is information about the data in a sample. It can help understand the values better. It may include the total number of values, minimum value, and maximum value, along with the mean value and the standard deviation corresponding to a data collection.

Q.9) Express big data in your own words. Explain the three V’s of big data with reference to email data. (Hint: An email box that contains hundreds of emails)

Ans:

Big data refers to large volumes of structured and unstructured data that cannot be easily processed or analyzed using traditional methods. It encompasses massive datasets that require advanced tools and techniques to extract insights and value effectively.

Now, let’s relate the three V’s of big data to an email inbox containing hundreds of emails:

i. Volume:

Volume refers to the complete data generated and stored. In the context of the email inbox, the volume would be the hundreds of emails received, sent, and stored within the inbox. Managing this volume requires efficient storage systems and processing capabilities to handle the large influx of emails.

ii. Velocity:

Velocity represents the speed at which data is generated, collected, and processed. In the case of an email inbox, velocity would refer to the rate at which emails are received, sent, and responded to. With hundreds of emails coming in daily, the velocity of email data is high, requiring timely processing and response to ensure efficient communication.

ii. Variety:

Variety refers to the diverse types and sources of data, including structured, semi-structured, and unstructured data. In an email inbox, the variety of data includes text-based messages, attachments, images, and other multimedia content. Managing this variety requires tools and algorithms capable of handling different data formats and extracting meaningful insights from them.

Q.10) Illustrate the purpose of data storage.

Ans:

The purpose of data storage is to securely and efficiently store data for future retrieval, analysis, and use. It serves as a centralized source for organizing, managing, and preserving data assets, ensuring data integrity, availability, and durability over time. Data storage enables quick access to information, supports data processing and analysis tasks, facilitates collaboration, and ensures compliance with regulatory requirements.

Give Long answers to the following extended response questions (ERQs).

Q.1) Sketch the key concepts of data science in your own words.

Ans: The following are some key concepts or components that lay the foundation of data science:

Data:

As mentioned earlier, data is a collection of observations, facts or information collected from different sources. This data can be in the form of numbers, measurements, words, observations, or in audio or video form.

Dataset:

A dataset is a structured or processed collection of data usually associated with a unique body of work. This collection of data is related to each other in some way, for example, a collection of brain CT scans of brain tumor patients is a dataset which can be used to evaluate certain patterns or trend common in the entire dataset.

Statistics and Probability:

Statistics is the analysis of the frequency of past events and probability is to predict the likelihood of future events. Data scientists use statistics and probability to find patterns and trends in the data.

Mathematics:

Mathematics is a fundamental part of data science which helps to solve problems, optimize the model performances, and interpret huge complex data into simple and clear results, for decision making.

Machine Learning:

Machine learning is a branch of Artificial Intelligence and computer science which emphasises on the use of data and algorithms to imitate human learning by computers.

Deep Learning:

Deep learning is the subset of Machine learning, with emphasis on the simulation or imitation of the human brain’s behaviour by using artificial neural networks.

Data Mining:

Data mining is the subset of data science which primarily focuses on discovering patterns and relationships in existing datasets. The usage of techniques and tools is limited in data mining as compared to data science.

Data Visualization:

Data visualization is the graphical representation of data using common charts, plots, infographics, and animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.

Big Data:

Big data refers to handling large volumes of data. Data scientists use big data to find patterns and trends in datasets, to obtain more accurate and reliable results. The huge size of data provides more opportunities for machine learning and provides better results.

Predictive Analysis:

Predictive analysis is the use of data to predict future trends and events based on historical data.

Natural Language Processing(NLP):

It is the study of the interaction between human language and computers. The common uses of NLP are chatbots, language translators and sentiment analysis.

Q.2) Develop your own thinking on the various data types used in data science.

In data science, we can mainly classify data into two main types qualitative(categorical) and quantitative(numeric).

i) Qualitative or Categorical data

It describes an object or a group of objects that can be labelled according to some group or category. It cannot be represented in numerical form. For example, data including colors, places, etc. It is further subdivided into two types:

a. Ordinal data

b. Nominal data

a. Ordinal Data:

Ordinal data sees a specific order or ranking, it uses a certain scale or measure to group data into categories. Such as in test grades, economic status, or military rank.

b. Nominal Data:

Nominal data does not have any order, it can be labelled into mutually exclusive categories, which cannot be ordered meaningfully. For example, if we consider the categories of transportation as car, bus or train. Similarly, gender, city, colour, and employment status are also examples of nominal data.

ii. Quantitative or Numerical data

It deals with numeric values, that can be computed mathematically to draw some conclusions. Examples of numeric data are height, weight, number of students in a school, fruits in a basket etc. Quantitative data can be further divided into two types:

a. Discrete data

b. Continuous data

a. Discrete Data:

It includes data which can only take certain values and cannot be further subdivided into smaller units. This data can be counted and has a finite number of values.

For example, the number of product reviews, tickets sold, computers in certain departments, employees in a company etc.

b. Continuous Data:

It refers to the unspecified number of possible measurements between two realistic points or numbers. For example, daily wind speed, weight of newborn babies, freezer’s temperature etc.

Q.3) Compare how big data is applicable to various fields of life. Illustrate your answer with suitable examples.

Ans: Big Data helps companies make smart decisions by using lots of data from various sources. This data can come from things like social media, weblogs, texts, and more. Big Data is used in many important areas:

Healthcare:

Big Data helps doctors keep track of patient information securely. It helps in using devices to monitor a patient’s health and suggest treatment.

Media and Entertainment:

Companies use Big Data to understand what people want to watch or read It helps them create and share content that people will like.

Internet of Things (loT)

Big data plays an important role in enhancing the capabilities of loT devices. loT devices generate continuous data. The analytics based on this huge data helps in personalized customer experience.

Manufacturing:

Big data helps manufacturing companies to make better products and smarter decisions. It helps in predicting when machines might need a break (predictive maintenance), making sure they don’t unexpectedly stop working. Big data also looks at how products are made better and cheaper.

Q.4) Relate the advantages and challenges of big data.

Ans:

Advantages and benefits of big data

Big data contains more information therefore it helps individuals, organizations, and businesses to optimize and generate cost-effective solutions. Big data has many advantages for the betterment and progress of business, some of them are as follows:

Product development:

Developing and creating new products, services or brands is much easier when based on data collected from customers’ needs and wants. Companies use big data to anticipate customer demand. They build predictive models for new products and services by classifying key attributes of past and current products.

Predictive maintenance:

It is a proactive maintenance strategy that uses the analysis of existing data to predict when equipment, machinery or product is likely to fail. Therefore, it indicates the potential issues before the problems happen.

Customer experience/satisfaction:

A clearer view of customer experience is more possible now than ever before. Big data enables businesses to gather data from social media, web visits, call logs, and other sources to improve customer satisfaction.

Fraud and compliance:

Big data analytics can identify and detect unusual suspicious

Big data challenges

Since there are many advantages of big data, businesses encounter many challenges of big data. Some of them are as follows:

i. Data Quality: Poor quality of data may lead to errors, inefficiency, and misleading insight after data analysis.

ii. Data Security and privacy: It is difficult to manage the protection and privacy of massive datasets to prevent unauthorized access.

iii. Rapid growth of data: Making systems that can handle more and more data as it keeps on growing without slowing down is challenging.

iv. Big data tool selection: Ensuring compatibility and seamless interaction between different big data tools and platforms.

v. Data integration: To create harmony among diverse data formats and structures is a difficult task.

Q.5) Design a case study about how data science and big data has revolutionized the field of healthcare.

Case Study: Transforming Healthcare with Data Science and Big Data

Introduction:

Data science and big data have significantly revolutionized healthcare by enabling predictive analytics, personalized medicine, and improved operational efficiency. With vast amounts of data from electronic health records (EHRs), genomic research, and wearable devices, healthcare providers can make more informed decisions that enhance patient care.

Challenge:

Healthcare traditionally relied on retrospective data and generalized treatment approaches. This led to inefficiencies, high operational costs, and inconsistent patient outcomes, especially in chronic disease management and treatment personalization.

Solution:

1. Predictive Analytics in Patient Care

Data science allows hospitals to use machine learning algorithms to predict patient outcomes and potential risks, such as readmissions or disease onset. For example, analyzing real-time data from wearable devices and EHRs can help predict and prevent critical conditions like heart attacks.

2. Personalized Medicine

Big data allows doctors to tailor treatments based on a patient’s genetic information, lifestyle, and medical history, improving the efficacy of treatments.

Outcome:

  • Reduced patient mortality rates through early detection of critical conditions.
  • Higher treatment success rates by tailoring medical care to individual patient profiles.
  • Improved hospital efficiency, reducing patient wait times and enhancing resource management.

Conclusion:

Data science and big data have revolutionized healthcare by enabling predictive, personalized, and efficient medical care, ultimately leading to better patient outcomes and more cost-effective healthcare systems.

Select the suitable answer for the following Multiple-choice questions.

1. ___________ is a structured or processed collection of data usually associated with a unique body of work.

a) Database

b) Dataset

c) Data and Information

d) Information

2. ________refers to the process of carefully examining and studying data to identify patterns, draw conclusions, or make the data meaningful.

a) Data analytics

b) Data Predictions

c) Dataset

d) Database

3. ______ is the graphical representation of data through use of common charts, plots, infographics, and animations.

a) Data cleaning

b) Missing values

c) Data visualization

d) Data hiding

4. __________is a subset of Machine learning, with emphasis on the simulation or imitation of the human brain’s behavior by using artificial neural networks.

a) Data visualization

b) Computer vision

c) Deep learning

d) Big Data

5. is the use of data to predict future trends and events based on historical data.

a) Statistical analysis

b) Predictive analysis

c) Graphical analysis

d) Deep learning

6. _________ is the fast rate at which data is received and acted on.

a) Volume

b) Velocity

c) Variety

d) Vision

7. __________includes the data which can only take certain values and cannot be further subdivided into smaller units.

a) Discrete data

b) Continuous data

c) Ordinal data

d) Referral data

8. __________is limitation of big data.

a) Statistical data

b) Unlimited growth of data

c) Data visualization

d) Predictive maintenance

9. Customer satisfaction levels such as satisfied, dissatisfied, and neutral are examples of data types.

a) Ordinal data

b) Continuous data

c) Numeric data

d) Discrete data

10. _________is a method of collecting information from individuals.

a) Survey

b) Data hiding  

c) Data visualization

d) Data finding

Muhammad Hussain

Recent Posts

Unit 8: Entrepreneurship in Digital Age

Unit 8: Entrepreneurship in Digital Age Write answers of the following short response questions. Q.1.…

1 month ago

Unit 7: Digital Literacy

Unit 7: Digital Literacy Write answers of the following short response questions. Q.1. Differentiate between…

1 month ago

Unit 6: Impacts of Computing

Unit 6: Impacts of Computing Write answers to the following short response questions. Q1. List…

1 month ago

Unit 5: Applications of Computer Science

Unit 5: Applications of Computer Science Write answers of the following short response questions. Q1.…

2 months ago

Unit 4: Data and Analysis

10th Computer Science Unit 4 Data and Analysis Write answers of the following short response…

3 months ago

Unit 3: Programming Fundamentals

10th Computer Unit 3: Programming Fundamentals Unit 3: Programming Fundamentals Write answers of the following…

3 months ago