Data Mining MCQ

1) What is Data mining?

  1. Process of discovering or finding knowledge from a large amount of data
  2. Processing antique artifacts
  3. Using the internet to learn new things
  4. None of the above
   

Answer:  a) Process of discovering or finding knowledge from a large amount of data.

Explanation: Data mining is the process of discovering or mining knowledge from a large amount of present data using tools such as statistics and artificial intelligence.

2) What does KDD stand for?

  1. Knowledge discovery database
  2. Knowledge-driven from data
  3. Knowledge discovery definition
  4. None of the above
   

 Answer:  a) Knowledge discovery database.

Explanation: KDD stands for Knowledge discovery from data.

3) What is the need for data mining?

  1. To find hidden patterns and trends in large databases
  2. Exploratory data analysis and deductive learning
  3. Automatic exploration of data
  4. Manual analysis is not possible for a large amount of data
    1. I, II
    2. II
    3. III
    4. All of the above
   

Answer:  d) All of the above

Explanation: The practical uses and advantages are finding hidden patterns and trends available in large databases, which is impossible to perform manually, auto exploration of data, and later data analysis reports.

4) What technologies comprise data mining?

  1. Regression analysis
  2. Cluster analysis
  3. Standard deviation
  4. Machine learning and artificial intelligence
    1. I, II
    2. II
    3. III
    4. All of the above
   

Answer: d) All of the above

Explanation: The primary technologies used in data mining are Regression analysis, cluster analysis, standard deviation, artificial intelligence, and machine learning.

5) What are the different types of databases where data mining is performed?

  1. Relational databases
  2. Data warehouses
  3. Data Repositories
  4. Object-Relational Databases
  5. Transactional Databases
    1. I, III
    2. I, II, III
    3. IV, V
    4. All of the above
   

Answer: d) All of the above

Explanation: The different types of databases where data mining can take place are

  1. Relational databases – data are organized in tables.
  2. Data warehouses – technology that collects data from various sources.
  3. Data Repositories – a group of databases.
  4. Object-Relational Databases – a combination of relational database and object-oriented database.
  5. Transactional Databases – database management system.

6) What are the advantages of using Data mining?

  1. Enables an organization to know based database
  2. It makes decision making easy
  3. A quick and efficient method
  4. It enables a user to find hidden pattern
    1. I, III
    2. I, II, III
    3. IV
    4. All of the above
   

Answer: d) All of the above

Explanation: The significant advantages of using Data mining includes

  1. Enables an organization to have a knowledge-based database.
  2. It makes decision making easy.
  3. A quick and efficient method.
  4. It enables a user to find hidden pattern.

7) What are the significant challenges of performing Data mining?

  1. Noisy data
  2. Data privacy
  3. Data visualization
  4. Complex data
  5. Data distribution
  6. performance
    1. I, III
    2. I, II, III
    3. IV, V, VI
    4. All of the above
   

Answer: d) All of the above

Explanation: The significant challenges faced during Data mining

  1. Noisy data
  2. Data privacy
  3. Data visualization
  4. Complex data
  5. Data distribution
  6. Performance

8) What are the major Data Mining Techniques?

  1. Classification
  2. Clustering
  3. Regression
  4. Outer
  5. Sequential Patterns
  6. Predictions
    1. I, III
    2. I, II, III, VII
    3. IV, V, VI
    4. All of the above
   

Answer: d) All of the above

Explanation: The major Data Mining Techniques

  1. Classification – technique to classify data in different classes.
  2. Clustering – a division of information according to specific groups.
  3. Regression – a process of identifying and analyzing relations and patterns between data.
  4. Outer – observation of data items in databases.
  5. Sequential Patterns – evaluating sequential data.
  6. Predictions – to predict the future based on the previous techniques.

9) What is CRISP-DM?

  1. The cross-industry standard process for data mining
  2. Cooperate industry standard process for data mining
  3. Cross-industry procedure for data mining
  4. None of the above
   

Answer: a) Cross-industry standard process for data mining.

Explanation: CRISP-DM stands for Cross-industry standard process for data mining which involves six steps Business understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment.

10) Data mining architecture?

  1. Data Source
  2. Cleaning, integrating, selecting
  3. Database or data warehouse servers
  4. Data mining engines
  5. Pattern evaluation
  6. GUI
    1. I, III
    2. I, II, III, VII
    3. IV, V, VI
    4. All of the above
   

Answer: d) All of the above

Explanation: The architecture of Data mining

  1. Data Source – retrieving data from miscellaneous sources.
  2. Cleaning, integrating selecting – retrieved data is processed through multiple methods like cleaning, integrating, selecting, and then later forwarding the processed data to a single database.
  3. Database or data warehouse servers – holds the original processed data.
  4. Data mining engines – contains modules to perform data mining tasks.
  5. Pattern evaluation – finding a specific pattern in the processed data.
  6. GUI – graphic user interface helps the user to communicate with the data mining system.

11) Steps involved in CRISP-DM?

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment         
    1. I, III
    2. I, II, III, VII
    3. IV, V, VI
    4. All of the above
   

Answer: d) All of the above

Explanation: The process of Cross-industry standard process for data mining involves

  1. Business understanding – understanding the goals and principal of the business.
  2. Data understanding – collecting initial data, explore and describe data and verify the quality of data.
  3. Data preparation – selecting data, cleaning it, integrating data from different sources, and later formatting it in defined tables and defined structures.
  4. Modeling – defining a model for derived data using specific modeling techniques like a decision tree and neural network.
  5. Evaluation – the last stage where the effectiveness of the constructed model is tested.
  6. Deployment – how to utilize the outcome.

12) Majorly used data mining tools?

  1. Rapid miner
  2. Data melt data mining
  3. Rattle data mining
  4. Orange
  5. SAS
    1. I, III
    2. I, II, III, VII
    3. IV, V, VI
    4. All of the above
   

Answer: d) All of the above

Explanation: Commonly used data mining tools

  1. Orange
  2. SAS
  3. Data melt
  4. Rattle
  5. Rapid Miner

13) Methods for text data mining?

  1. Document classification analysis
  2. Data mining procedure to sort data
  3. A method to find data
  4. None of the above
   

Answer:  a) Document classification analysis

Explanation: This method is used to analyze a large number of online text documents.

14) What is pre-processing?

  1. Setting up a target data
  2. Data mining procedure to sort data
  3. A method to find data
  4. To represent the derivate data with visualization and reports.
   

Answer: a) Setting up a target data

Explanation: Pre-processing is the method of setting up target data to perform the data mining algorithms.

15) What is classification in data mining?

  1. Setting up a target data
  2. Data mining procedure to sort data
  3. A method to find data
  4. Generalizing structures
   

Answer:  d) Generalising structures

Explanation: Classification of data is the method of categorizing data in certain sets and subsets.

16) What is Regression analysis?

  1. A model with the most negligible errors to derive a relationship in data
  2. Data mining procedure to sort data
  3. A method to find data
  4. None of the above
   

Answer:  a) Model with most negligible errors to derive a relationship in data.

Explanation: Regression analysis is the process of finding a model with the least number of errors to derive a specific relationship in data.

17) What is Summarization in data mining?

  1. Setting up a target data
  2. Data mining procedure to sort data
  3. A method to find data
  4. To represent the derivate data with visualization and reports.
   

Answer: d) To represent the derivate data with visualization and reports.

Explanation: Summarization refers to representing the processed data with valid graphical and data representation.

18) What is OLAP?

  1. Online analytical pre-processing
  2. Offline analytical pre-processing
  3. Offline analytical processing
  4. Online analytical processing
   

Answer:  d) Online analytical processing

Explanation: OLAP stands for online analytic processing. It is used in data warehouses to store a massive amount of data with fast data pull requests.

19) What is OLTP?

  1. Online transactional pre-processing
  2. Offline transactional pre-processing
  3. Offline transactional processing
  4. Online Transactional processing
   

Answer: d) Online transactional processing

Explanation: OLTP stands for online transactional processing. It is used in the database.

20) What is a data warehouse?

  1. More storage
  2. Process for determining data patterns
  3. Database system designed for analytics
  4. None of the above
   

Answer: c) Database system designed for analytics

Explanation: A data warehouse is a database system designed for analytics, and it processes all the relevant data by combining all of them. And it further sorts and streamlines any business data.

21) Advantages of using a data warehouse?

  1. More storage
  2. Easy to access
  3. More accurate, improved performance, cost-efficient
  4. None of the above
   

Answer: c) More accurate, improved performance, cost-efficient.

Explanation: A data warehouse is much more accurate with improved performance in term of speed and is cost-efficient, which also provides quality data.

22) What is Web Mining?

  1. Application of data mining for web operations
  2. Application of data mining for programming languages
  3. Finding data through different websites
  4. None of the above
   

Answer:  a) Application of data mining for web operations.

Explanation: Web mining is an application of data mining for web operations like collecting data regarding current topics and analyzing the result to find what's the best for web.

23) Types of web mining?

  1. Web content mining
  2. Web usage mining
  3. Web structure mining
  4. Web data mining
    1. I, II, III
    2. I, II, IV
    3. All of the above
    4. None of the above
   

Answer: a) I, II, III

Explanation: Different types of Web mining are

  1. Web content mining – find suitable data and information for the web page content.
  2. Web usage mining - tracking visitors and their responses.
  3. Web structure mining – data to maintain a stable structure.

24) What is Web content mining?

  1. Find suitable data and information for the web page content
  2. Find suitable data and information for the web page design
  3. Find suitable data and information for the web page architecture
  4. none of the above
   

Answer: a) Find suitable data and information for the web page content.

Explanation: Data Web mining is the process of finding information for a web page. The primary usage of content mining is retrieving data from multiple sources.

25) What is Web usage mining?

  1. Find suitable data and information about the user’s information
  2. Find suitable data and information for the web page design
  3. Find suitable data and information for the web page architecture
  4. none of the above
   

Answer: a) Find suitable data and information about the user's information.

Explanation: Data web usage mining finds information regarding user's log, user information, and activity logs.

26) What is Web structure mining?

  1. Find hyperlinks between web pages and servers
  2. Find suitable data and information for the web page design
  3. Find suitable data and information for the web page architecture
  4. none of the above
   

Answer:  a) Find hyperlinks between web pages and servers.

Explanation: Data Web structure mining consists of finding hyperlinks to find whether the connected data is linked with a website or directly linked to a network.

27) Significant challenges faced while web mining?

  1. Complex web pages
  2. Massive data
  3. Relevant data
  4. Diverse network
    1. I, II, IV
    2. II, III
    3. I, IV
    4. All of the above
   

Answer: d) All of the above

Explanation: Significant challenges faced while web mining is

  1. Complex web pages – many websites are not structured and contain complex design.
  2. Massive data – the internet is vast and complex, with an enormous amount of data to search within.
  3. Relevant data – due to enormous data present, most of the data is irrelevant to the user.
  4. Diverse network – it contains data with different interests and purposes.

28) The primary application of web mining?

  1. Marketing tool
  2. Website analysis
  3. Audience analysis
  4. Advertisement data
  5. Site testing
    1. IV, V
    2. I, II, III, IV
    3. I, III, IV
    4. All of the above
   

Answer:  d) All of the above

Explanation: The standard application of web mining is

  1. Marketing tool
  2. Website analysis
  3. Audience analysis
  4. Advertisement data
  5. Site testing

29) What is a cluster?

  1. Machine learning algorithm to sort objects that belongs to the same group
  2. Data mining procedure to sort data
  3. A method to find data
  4. None of the above
   

Answer: a) Machine learning algorithm to sort objects that belong to the same group.

Explanation: Clustering is a machine-learning algorithm to sort objects that belong to the same group. It helps to split the data into different subsets with a similar type of data.

30) What are the different types of clusters?

  1. Well separated clusters
  2. Prototype-based cluster
  3. Center-based cluster
  4. Graph-based cluster
  5. Density-based cluster
    1. I, II, III, V
    2. IV, V
    3. I, II, IV, V
    4. All of the above
   

Answer: d) All of the above

Explanation: Different types of the cluster are

  1. Well separated clusters – every object is placed with closer and similar spacing.
  2. Prototype-based cluster – in this type of cluster, an object is placed with similar type prototypes.
  3. Center-based cluster – cluster is more concentrated towards the center.
  4. Graph-based cluster – data od cluster is represented as a graph.
  5. Density-based cluster – data is superheated according to density.

31) What is cluster analysis?

  1. Machine learning algorithm to sort objects that belongs to the same group
  2. Data mining procedure to sort data
  3. Analyzing a group of data that is sorted in the same group and are more similar
  4. None of the above
   

Answer:  c) Analysing a group of sorted data in the same group and are more similar.

Explanation: Cluster analysis is the task of analyzing a specific set of object or data in such a way that the data r the object lies in the same group called clusters.

32) What are the different types of cluster analysis used in data mining?

  1. Centroid clustering
  2. Density clustering
  3. Distribution clustering
  4. Connectivity clustering
    1. III, IV
    2. I, II, III
    3. All of the above
    4. None of the above
   

Answer: d) All of the above

Explanation: Types of cluster analysis used in data mining

  1. Centroid clustering – here, we choose the number of clusters we want. For example, if an owner of a car store wants to analyze, they will segment the customer list according to who buy a sports car or economy car.
  2. Density clustering – these are defined by how densely populated a data point is.
  3. Distribution clustering – this cluster identifies the probability of a data point that belongs to a specific cluster.
  4. Connectivity clustering –in this, each data point is identified as a particular cluster, and the measure distance between two data points the closer the data points, the similar they will be.

33) Application of cluster analysis?

  1. Data analysis
  2. Market research
  3. Pattern recognition
  4. Image processing
    1. I, III, IV
    2. I
    3. II
    4. All of the above
   

Answer: d) All of the above

Explanation: A primary application of cluster analysis includes data analysis, market research, pattern recognition, and image processing.

34) Advantages of using cluster analysis in data mining?

  1. Scalable
  2. Ability to deal with noisy data
  3. Interpretability
  4. Easy to access
  5. Ability to deal with different types of attributes
    1. I
    2. II
    3. I, II, III, V
    4. All of the above
   

Answer: c) I, II, III, V.

Explanation: Advantages of using cluster analysis in data mining are

  1. Scalable
  2. Ability to deal with noisy data
  3. Interpretability
  4. Ability to deal with different types of attributes

35) What is big data?

  1. A vast amount of data with high volume and high velocity, usually in terabytes
  2. Data mining procedure to sort data
  3. A method to find data
  4. Uses of tools like stats, ML, and visualization to extract valuable data
   

Answer:  a) A vast amount of data with high volume and high velocity, usually in tera-bytes

Explanation: Big data refers to a vast amount of data with high volume and high velocity, usually in tera-bytes which can't be processed in an ordinary computer or other databases, and most of the big data is processed with HADOOP, which is an open-source framework.

36) What is a decision tree?

  1. Machine learning algorithm to sort objects that belongs to the same group
  2. Data mining procedure to sort data
  3. A method to find data
  4. Learning procedure used for classification and regression
   

Answer:  d) Learning procedure used for classification and regression

Explanation: A decision tree refers to a learning procedure mainly used in data mining for classification and regression. It helps mainly in the decision-making process. The created structure is in the form of a tree.

37) Uses of decision tree?

  1. Helps to identify possible consequences of a decision
  2. Measure probability
  3. Helps to speculate data
  4. Easy to access
  5. Ability to deal with different types of attributes
    1. I
    2. II
    3. I, II, III
    4. All of the above
   

Answer: c) I, II, III.

Explanation: Significant uses of the decision tree are

  1. It helps the user to analyze all the possible consequences of a specific decision.
  2. We can easily measure the probability of accomplishing a task and the possible value of outcomes.
  3. We can make the best possible decision based on existing data using a decision tree.

38) What is Entropy?

  1. Used to measure randomness and impurity
  2. Data mining procedure to sort data
  3. A method to find data
  4. None of the above
   

Answer: a) used to measure randomness and impurity.

Explanation: Entropy is a part of the decision tree in data mining used to measure the randomness and impurity present in a data set.

39) Parameters of decision tree algorithm?

  1. D
  2. Attribute_list
  3. Attribute_selection_method
  4. Attribute_sdetection_method
  5. Analysed_list
    1. I
    2. II
    3. I, II, III
    4. All of the above
   

Answer: c) I, II, III.

Explanation: The main parameters used in the Data tree algorithm are

  1. D – data proportion, training tuple sets, or training data.
  2. Attribute_list – set of attributes that define tuples.
  3. Attribute_selection_method – a method that chooses the best attribute.

40) What is the attribute selection method?

  1. Method to select set of attributes that define tuples
  2. Data mining procedure to sort data
  3. A method to find data
  4. A method that chooses the best attribute
   

Answer: d) Method that chooses the best attribute

Explanation: The decision tree algorithm consists of 3 parameters D, attribute list, and attribute selection methods, where the attribute selection method enables the attribute selection measure and chooses the attribute that best describes a given tuple.

41) What are the advantages of using a decision tree?

  1. No need for scaling
  2. Missing data does not influence the tree
  3. automatic
  4. Easy to access
  5. Ability to deal with different types of attributes
    1. I
    2. II
    3. I, II, III
    4. All of the above
   

Answer: c) I, II, III.

Explanation: Advantages of using a decision tree in data mining are

  1. It does not require information scaling.
  2. Missing data values do not influence decision trees.
  3. It is an automatic process and is easily explainable.

42) What is DMQL?

  1. Data mining query language
  2. Data measuring query language
  3. Data manipulating query language
  4. None of the above
   

Answer: a) Data mining query language.

Explanation: DMQL stands for Data Mining Query Language. It is a language based on SQL and is used to define data mining tasks.

43) Some primary applications of data mining?

  1. Corporate analysis
  2. Risk management
  3. Market analysis
  4. Managing elongated data
  5. Fraud detection
  6. Finding suitable audience
    1. I
    2. II, III, VI
    3. I, II, III, V
    4. All of the above
   

Answer: d) All of the above

Explanation: Some significant applications of data mining are: -

  1. Corporate analysis – analyzing aspects of business.
  2. Risk management – using algorithms to calculate risks.
  3. Market analysis – having a thorough study of market.
  4. Managing elongated data – finding patterns between data.
  5. Fraud detection – detecting the negative aspects of collected data.
  6. Finding suitable audience – getting result aftermarket and audience surveys.

44) How to classify the Data mining system involved?

  1. Database technologies
  2. Information science
  3. Machine learning
  4. All of the above
   

Answer: d) All of the above

Explanation: We can easily classify the data system into different categories based on Database technologies used, all the scientific information involved, and machine learning.

45) What is regression?

  1. Database technologies
  2. Information used to calculate funds
  3. Mathematical method to derive cost
  4. The statistical method used in finance, investing, etc.
   

Answer: d) Statistical method used in finance, investing, etc.

Explanation: Regression stands for the statistical method used in finance, investing, accounting, etc., to determine the strengths and character of relationships between variables.

46) What is regression analysis?

  1. Database technologies
  2. Information used to calculate funds
  3. Mathematical method to derive cost
  4. The statistical method used to estimate relationships
   

Answer: d) Statistical method used in finance, investing, etc.

Explanation: Regression Analysis stands for the statistical method used in finance, investing, accounting, etc., to establish a relationship between dependent and independent variables.

Formula –

Y = f (X, β) +e

47) How to use regression analysis in data mining?

  1. Database technology
  2. It is used to calculate funds
  3. It is used to derive cost
  4. It is used to predict a range of numeric values
   

Answer: d) It is used to predict a range of numeric values

Explanation: Regression Analysis is used to predict a range of numeric values in a dataset. For example, a regression can predict a reasonable cost for a new product.

48) Types of regression techniques?

  1. Standard multiple regression
  2. Stepwise multiple regression
  3. Hierarchical regression
  4. Stepwise regression
  5. Risk management regression
  6. Market analysis regression
    1. I
    2. II, III, VI
    3. I, II, III, IV
    4. All of the above
   

Answer: c) I, II, III, IV.

Explanation: Types of regression techniques are: -

  1. Standard multiple regression.
  2. Stepwise multiple regression.
  3. Hierarchical regression.
  4. Stepwise regression.

49) Uses of data mining in health management?

  1. Healthcare Management
  2. Treatment effectiveness
  3. Customer relationship management
  4. Fraud and abuse
    1. II, III, VI
    2. I, II, IV
    3. All of the above
    4. None of the above
   

Answer: c) all of the above

Explanation: Uses of data mining in healthcare management: -

  1. Healthcare Management – data mining applications to track different diseases.
  2. Treatment effectiveness - data mining applications to track how to proceed with treatment, etc.
  3. Customer relationship management - data mining applications to interact with customer.
  4. Fraud and abuse - data mining applications to track fraud and abuse.

50) What is EDM in data mining?

  1. Electronic dance music
  2. Educational data mining
  3. Electoral data mining
  4. Efficient data mining
   

Answer: b) Educational data mining

Explanation: EDM stands for educational data mining which is a set of multiple algorithms to improve education.