Data Mining MCQ

1) What is Data mining?

Process of discovering or finding knowledge from a large amount of data
Processing antique artifacts
Using the internet to learn new things
None of the above

Answer: a) Process of discovering or finding knowledge from a large amount of data.

Explanation: Data mining is the process of discovering or mining knowledge from a large amount of present data using tools such as statistics and artificial intelligence.

2) What does KDD stand for?

Knowledge discovery database
Knowledge-driven from data
Knowledge discovery definition
None of the above

Answer: a) Knowledge discovery database.

Explanation: KDD stands for Knowledge discovery from data.

3) What is the need for data mining?

To find hidden patterns and trends in large databases
Exploratory data analysis and deductive learning
Automatic exploration of data
Manual analysis is not possible for a large amount of data
1. I, II
2. II
3. III
4. All of the above

Answer: d) All of the above

Explanation: The practical uses and advantages are finding hidden patterns and trends available in large databases, which is impossible to perform manually, auto exploration of data, and later data analysis reports.

4) What technologies comprise data mining?

Regression analysis
Cluster analysis
Standard deviation
Machine learning and artificial intelligence
1. I, II
2. II
3. III
4. All of the above

Answer: d) All of the above

Explanation: The primary technologies used in data mining are Regression analysis, cluster analysis, standard deviation, artificial intelligence, and machine learning.

5) What are the different types of databases where data mining is performed?

Relational databases
Data warehouses
Data Repositories
Object-Relational Databases
Transactional Databases
1. I, III
2. I, II, III
3. IV, V
4. All of the above

Answer: d) All of the above

Explanation: The different types of databases where data mining can take place are

Relational databases – data are organized in tables.
Data warehouses – technology that collects data from various sources.
Data Repositories – a group of databases.
Object-Relational Databases – a combination of relational database and object-oriented database.
Transactional Databases – database management system.

6) What are the advantages of using Data mining?

Enables an organization to know based database
It makes decision making easy
A quick and efficient method
It enables a user to find hidden pattern
1. I, III
2. I, II, III
3. IV
4. All of the above

Answer: d) All of the above

Explanation: The significant advantages of using Data mining includes

Enables an organization to have a knowledge-based database.
It makes decision making easy.
A quick and efficient method.
It enables a user to find hidden pattern.

7) What are the significant challenges of performing Data mining?

Noisy data
Data privacy
Data visualization
Complex data
Data distribution
performance
1. I, III
2. I, II, III
3. IV, V, VI
4. All of the above

Answer: d) All of the above

Explanation: The significant challenges faced during Data mining

Noisy data
Data privacy
Data visualization
Complex data
Data distribution
Performance

8) What are the major Data Mining Techniques?

Classification
Clustering
Regression
Outer
Sequential Patterns
Predictions
1. I, III
2. I, II, III, VII
3. IV, V, VI
4. All of the above

Answer: d) All of the above

Explanation: The major Data Mining Techniques

Classification – technique to classify data in different classes.
Clustering – a division of information according to specific groups.
Regression – a process of identifying and analyzing relations and patterns between data.
Outer – observation of data items in databases.
Sequential Patterns – evaluating sequential data.
Predictions – to predict the future based on the previous techniques.

9) What is CRISP-DM?

The cross-industry standard process for data mining
Cooperate industry standard process for data mining
Cross-industry procedure for data mining
None of the above

Answer: a) Cross-industry standard process for data mining.

Explanation: CRISP-DM stands for Cross-industry standard process for data mining which involves six steps Business understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment.

10) Data mining architecture?

Data Source
Cleaning, integrating, selecting
Database or data warehouse servers
Data mining engines
Pattern evaluation
GUI
1. I, III
2. I, II, III, VII
3. IV, V, VI
4. All of the above

Answer: d) All of the above

Explanation: The architecture of Data mining

Data Source – retrieving data from miscellaneous sources.
Cleaning, integrating selecting – retrieved data is processed through multiple methods like cleaning, integrating, selecting, and then later forwarding the processed data to a single database.
Database or data warehouse servers – holds the original processed data.
Data mining engines – contains modules to perform data mining tasks.
Pattern evaluation – finding a specific pattern in the processed data.
GUI – graphic user interface helps the user to communicate with the data mining system.

11) Steps involved in CRISP-DM?

Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment
1. I, III
2. I, II, III, VII
3. IV, V, VI
4. All of the above

Answer: d) All of the above

Explanation: The process of Cross-industry standard process for data mining involves

Business understanding – understanding the goals and principal of the business.
Data understanding – collecting initial data, explore and describe data and verify the quality of data.
Data preparation – selecting data, cleaning it, integrating data from different sources, and later formatting it in defined tables and defined structures.
Modeling – defining a model for derived data using specific modeling techniques like a decision tree and neural network.
Evaluation – the last stage where the effectiveness of the constructed model is tested.
Deployment – how to utilize the outcome.

12) Majorly used data mining tools?

Rapid miner
Data melt data mining
Rattle data mining
Orange
SAS
1. I, III
2. I, II, III, VII
3. IV, V, VI
4. All of the above

Answer: d) All of the above

Explanation: Commonly used data mining tools

Orange
SAS
Data melt
Rattle
Rapid Miner

13) Methods for text data mining?

Document classification analysis
Data mining procedure to sort data
A method to find data
None of the above

Answer: a) Document classification analysis

Explanation: This method is used to analyze a large number of online text documents.

14) What is pre-processing?

Setting up a target data
Data mining procedure to sort data
A method to find data
To represent the derivate data with visualization and reports.

Answer: a) Setting up a target data

Explanation: Pre-processing is the method of setting up target data to perform the data mining algorithms.

15) What is classification in data mining?

Setting up a target data
Data mining procedure to sort data
A method to find data
Generalizing structures

Answer: d) Generalising structures

Explanation: Classification of data is the method of categorizing data in certain sets and subsets.

16) What is Regression analysis?

A model with the most negligible errors to derive a relationship in data
Data mining procedure to sort data
A method to find data
None of the above

Answer: a) Model with most negligible errors to derive a relationship in data.

Explanation: Regression analysis is the process of finding a model with the least number of errors to derive a specific relationship in data.

17) What is Summarization in data mining?

Setting up a target data
Data mining procedure to sort data
A method to find data
To represent the derivate data with visualization and reports.

Answer: d) To represent the derivate data with visualization and reports.

Explanation: Summarization refers to representing the processed data with valid graphical and data representation.

18) What is OLAP?

Online analytical pre-processing
Offline analytical pre-processing
Offline analytical processing
Online analytical processing

Answer: d) Online analytical processing

Explanation: OLAP stands for online analytic processing. It is used in data warehouses to store a massive amount of data with fast data pull requests.

19) What is OLTP?

Online transactional pre-processing
Offline transactional pre-processing
Offline transactional processing
Online Transactional processing

Answer: d) Online transactional processing

Explanation: OLTP stands for online transactional processing. It is used in the database.

20) What is a data warehouse?

More storage
Process for determining data patterns
Database system designed for analytics
None of the above

Answer: c) Database system designed for analytics

Explanation: A data warehouse is a database system designed for analytics, and it processes all the relevant data by combining all of them. And it further sorts and streamlines any business data.

21) Advantages of using a data warehouse?

More storage
Easy to access
More accurate, improved performance, cost-efficient
None of the above

Answer: c) More accurate, improved performance, cost-efficient.

Explanation: A data warehouse is much more accurate with improved performance in term of speed and is cost-efficient, which also provides quality data.

22) What is Web Mining?

Application of data mining for web operations
Application of data mining for programming languages
Finding data through different websites
None of the above

Answer: a) Application of data mining for web operations.

Explanation: Web mining is an application of data mining for web operations like collecting data regarding current topics and analyzing the result to find what's the best for web.

23) Types of web mining?

Web content mining
Web usage mining
Web structure mining
Web data mining
1. I, II, III
2. I, II, IV
3. All of the above
4. None of the above

Answer: a) I, II, III

Explanation: Different types of Web mining are

Web content mining – find suitable data and information for the web page content.
Web usage mining - tracking visitors and their responses.
Web structure mining – data to maintain a stable structure.

24) What is Web content mining?

Find suitable data and information for the web page content
Find suitable data and information for the web page design
Find suitable data and information for the web page architecture
none of the above

Answer: a) Find suitable data and information for the web page content.

Explanation: Data Web mining is the process of finding information for a web page. The primary usage of content mining is retrieving data from multiple sources.

25) What is Web usage mining?

Find suitable data and information about the user’s information
Find suitable data and information for the web page design
Find suitable data and information for the web page architecture
none of the above

Answer: a) Find suitable data and information about the user's information.

Explanation: Data web usage mining finds information regarding user's log, user information, and activity logs.

26) What is Web structure mining?

Find hyperlinks between web pages and servers
Find suitable data and information for the web page design
Find suitable data and information for the web page architecture
none of the above

Answer: a) Find hyperlinks between web pages and servers.

Explanation: Data Web structure mining consists of finding hyperlinks to find whether the connected data is linked with a website or directly linked to a network.

27) Significant challenges faced while web mining?

Complex web pages
Massive data
Relevant data
Diverse network
1. I, II, IV
2. II, III
3. I, IV
4. All of the above

Answer: d) All of the above

Explanation: Significant challenges faced while web mining is

Complex web pages – many websites are not structured and contain complex design.
Massive data – the internet is vast and complex, with an enormous amount of data to search within.
Relevant data – due to enormous data present, most of the data is irrelevant to the user.
Diverse network – it contains data with different interests and purposes.

28) The primary application of web mining?

Marketing tool
Website analysis
Audience analysis
Advertisement data
Site testing
1. IV, V
2. I, II, III, IV
3. I, III, IV
4. All of the above

Answer: d) All of the above

Explanation: The standard application of web mining is

Marketing tool
Website analysis
Audience analysis
Advertisement data
Site testing

29) What is a cluster?

Machine learning algorithm to sort objects that belongs to the same group
Data mining procedure to sort data
A method to find data
None of the above

Answer: a) Machine learning algorithm to sort objects that belong to the same group.

Explanation: Clustering is a machine-learning algorithm to sort objects that belong to the same group. It helps to split the data into different subsets with a similar type of data.

30) What are the different types of clusters?

Well separated clusters
Prototype-based cluster
Center-based cluster
Graph-based cluster
Density-based cluster
1. I, II, III, V
2. IV, V
3. I, II, IV, V
4. All of the above

Answer: d) All of the above

Explanation: Different types of the cluster are

Well separated clusters – every object is placed with closer and similar spacing.
Prototype-based cluster – in this type of cluster, an object is placed with similar type prototypes.
Center-based cluster – cluster is more concentrated towards the center.
Graph-based cluster – data od cluster is represented as a graph.
Density-based cluster – data is superheated according to density.

31) What is cluster analysis?

Machine learning algorithm to sort objects that belongs to the same group
Data mining procedure to sort data
Analyzing a group of data that is sorted in the same group and are more similar
None of the above

Answer: c) Analysing a group of sorted data in the same group and are more similar.

Explanation: Cluster analysis is the task of analyzing a specific set of object or data in such a way that the data r the object lies in the same group called clusters.

32) What are the different types of cluster analysis used in data mining?

Centroid clustering
Density clustering
Distribution clustering
Connectivity clustering
1. III, IV
2. I, II, III
3. All of the above
4. None of the above

Answer: d) All of the above

Explanation: Types of cluster analysis used in data mining

Centroid clustering – here, we choose the number of clusters we want. For example, if an owner of a car store wants to analyze, they will segment the customer list according to who buy a sports car or economy car.
Density clustering – these are defined by how densely populated a data point is.
Distribution clustering – this cluster identifies the probability of a data point that belongs to a specific cluster.
Connectivity clustering –in this, each data point is identified as a particular cluster, and the measure distance between two data points the closer the data points, the similar they will be.

33) Application of cluster analysis?

Data analysis
Market research
Pattern recognition
Image processing
1. I, III, IV
2. I
3. II
4. All of the above

Answer: d) All of the above

Explanation: A primary application of cluster analysis includes data analysis, market research, pattern recognition, and image processing.

34) Advantages of using cluster analysis in data mining?

Scalable
Ability to deal with noisy data
Interpretability
Easy to access
Ability to deal with different types of attributes
1. I
2. II
3. I, II, III, V
4. All of the above

Answer: c) I, II, III, V.

Explanation: Advantages of using cluster analysis in data mining are

Scalable
Ability to deal with noisy data
Interpretability
Ability to deal with different types of attributes

35) What is big data?

A vast amount of data with high volume and high velocity, usually in terabytes
Data mining procedure to sort data
A method to find data
Uses of tools like stats, ML, and visualization to extract valuable data

Answer: a) A vast amount of data with high volume and high velocity, usually in tera-bytes

Explanation: Big data refers to a vast amount of data with high volume and high velocity, usually in tera-bytes which can't be processed in an ordinary computer or other databases, and most of the big data is processed with HADOOP, which is an open-source framework.

36) What is a decision tree?

Machine learning algorithm to sort objects that belongs to the same group
Data mining procedure to sort data
A method to find data
Learning procedure used for classification and regression

Answer: d) Learning procedure used for classification and regression

Explanation: A decision tree refers to a learning procedure mainly used in data mining for classification and regression. It helps mainly in the decision-making process. The created structure is in the form of a tree.

37) Uses of decision tree?

Helps to identify possible consequences of a decision
Measure probability
Helps to speculate data
Easy to access
Ability to deal with different types of attributes
1. I
2. II
3. I, II, III
4. All of the above

Answer: c) I, II, III.

Explanation: Significant uses of the decision tree are

It helps the user to analyze all the possible consequences of a specific decision.
We can easily measure the probability of accomplishing a task and the possible value of outcomes.
We can make the best possible decision based on existing data using a decision tree.

38) What is Entropy?

Used to measure randomness and impurity
Data mining procedure to sort data
A method to find data
None of the above

Answer: a) used to measure randomness and impurity.

Explanation: Entropy is a part of the decision tree in data mining used to measure the randomness and impurity present in a data set.

39) Parameters of decision tree algorithm?

D
Attribute_list
Attribute_selection_method
Attribute_sdetection_method
Analysed_list
1. I
2. II
3. I, II, III
4. All of the above

Answer: c) I, II, III.

Explanation: The main parameters used in the Data tree algorithm are

D – data proportion, training tuple sets, or training data.
Attribute_list – set of attributes that define tuples.
Attribute_selection_method – a method that chooses the best attribute.

40) What is the attribute selection method?

Method to select set of attributes that define tuples
Data mining procedure to sort data
A method to find data
A method that chooses the best attribute

Answer: d) Method that chooses the best attribute

Explanation: The decision tree algorithm consists of 3 parameters D, attribute list, and attribute selection methods, where the attribute selection method enables the attribute selection measure and chooses the attribute that best describes a given tuple.

41) What are the advantages of using a decision tree?

No need for scaling
Missing data does not influence the tree
automatic
Easy to access
Ability to deal with different types of attributes
1. I
2. II
3. I, II, III
4. All of the above

Answer: c) I, II, III.

Explanation: Advantages of using a decision tree in data mining are

It does not require information scaling.
Missing data values do not influence decision trees.
It is an automatic process and is easily explainable.

42) What is DMQL?

Data mining query language
Data measuring query language
Data manipulating query language
None of the above

Answer: a) Data mining query language.

Explanation: DMQL stands for Data Mining Query Language. It is a language based on SQL and is used to define data mining tasks.

43) Some primary applications of data mining?

Corporate analysis
Risk management
Market analysis
Managing elongated data
Fraud detection
Finding suitable audience
1. I
2. II, III, VI
3. I, II, III, V
4. All of the above

Answer: d) All of the above

Explanation: Some significant applications of data mining are: -

Corporate analysis – analyzing aspects of business.
Risk management – using algorithms to calculate risks.
Market analysis – having a thorough study of market.
Managing elongated data – finding patterns between data.
Fraud detection – detecting the negative aspects of collected data.
Finding suitable audience – getting result aftermarket and audience surveys.

44) How to classify the Data mining system involved?

Database technologies
Information science
Machine learning
All of the above

Answer: d) All of the above

Explanation: We can easily classify the data system into different categories based on Database technologies used, all the scientific information involved, and machine learning.

45) What is regression?

Database technologies
Information used to calculate funds
Mathematical method to derive cost
The statistical method used in finance, investing, etc.

Answer: d) Statistical method used in finance, investing, etc.

Explanation: Regression stands for the statistical method used in finance, investing, accounting, etc., to determine the strengths and character of relationships between variables.

46) What is regression analysis?

Database technologies
Information used to calculate funds
Mathematical method to derive cost
The statistical method used to estimate relationships

Answer: d) Statistical method used in finance, investing, etc.

Explanation: Regression Analysis stands for the statistical method used in finance, investing, accounting, etc., to establish a relationship between dependent and independent variables.

Formula –

Y = f (X, β) +e

47) How to use regression analysis in data mining?

Database technology
It is used to calculate funds
It is used to derive cost
It is used to predict a range of numeric values

Answer: d) It is used to predict a range of numeric values

Explanation: Regression Analysis is used to predict a range of numeric values in a dataset. For example, a regression can predict a reasonable cost for a new product.

48) Types of regression techniques?

Standard multiple regression
Stepwise multiple regression
Hierarchical regression
Stepwise regression
Risk management regression
Market analysis regression
1. I
2. II, III, VI
3. I, II, III, IV
4. All of the above

Answer: c) I, II, III, IV.

Explanation: Types of regression techniques are: -

Standard multiple regression.
Stepwise multiple regression.
Hierarchical regression.
Stepwise regression.

49) Uses of data mining in health management?

Healthcare Management
Treatment effectiveness
Customer relationship management
Fraud and abuse
1. II, III, VI
2. I, II, IV
3. All of the above
4. None of the above

Answer: c) all of the above

Explanation: Uses of data mining in healthcare management: -

Healthcare Management – data mining applications to track different diseases.
Treatment effectiveness - data mining applications to track how to proceed with treatment, etc.
Customer relationship management - data mining applications to interact with customer.
Fraud and abuse - data mining applications to track fraud and abuse.

50) What is EDM in data mining?

Electronic dance music
Educational data mining
Electoral data mining
Efficient data mining

Answer: b) Educational data mining

Explanation: EDM stands for educational data mining which is a set of multiple algorithms to improve education.

MCQ Test Preparation

MCQ Test Preparation

Data Mining MCQ