Java Program to Find the Most Repeated Word in a Text File
In this tutorial, we will write a program that reads a text file, tokenizes the words, removes non-alphabetic characters, and counts the occurrences of each word using a HashMap. It then identifies and prints the word with the highest count, representing the most repeated word in the text file.
Approach 1: To Find The Most Repeated Word In Text File
Step 1: Open the text file for reading. Create a BufferedReader or FileReader to read the file line by line.
Step 2: For each line in the file. Split the line into words using regular expressions. Remove punctuation and convert words to lowercase.
Step 3: Count Word Occurrences. Create a HashMap to store word frequencies for each word in the tokenized text. Update the word count in the HashMap.
Step 4: Initialize maxFrequency to 0. Initialize mostRepeatedWord to null. For each entry in the HashMap. Suppose the frequency is greater than maxFrequency. Update maxFrequency and mostRepeatedWord.
Step 5: Display the Result. Print the mostRepeatedWord and its frequency.
File name: Wordrepeat.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;
public class Wordrepeat {
public static void main(String[] args) {
try {
// Specify the path to the text file
String filePath = "file.txt";
// Create a BufferedReader to read from the file
BufferedReader reader = new BufferedReader(new FileReader(filePath));
// Map to store word frequencies
Map<String, Integer> wordCountMap = new HashMap<>();
// Read each line from the file
String line;
while ((line = reader.readLine()) != null) {
// Split the line into words
String[] words = line.split("\\s+");
// Count the frequency of each word
for (String word : words) {
// Remove punctuation and convert to lowercase for simplicity
word = word.replaceAll("[^a-zA-Z]", "").toLowerCase();
// Update the word count on the map
wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
}
}
// Find the most repeated word
String mostRepeatedWord = null;
int maxFrequency = 0;
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
if (entry.getValue() > maxFrequency) {
maxFrequency = entry.getValue();
mostRepeatedWord = entry.getKey();
}
}
// Display the result
if (mostRepeatedWord != null) {
System.out.println("Most repeated word: " + mostRepeatedWord);
System.out.println("Frequency: " + maxFrequency);
} else {
System.out.println("No words found in the file.");
}
// Close the BufferedReader
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
file.txt:
Java is a high-level, object-oriented programming language designed for cross-platform compatibility. Developed by Sun Microsystems and released in 1995, Java allows developers to write code that can run on any device with a Java Virtual Machine (JVM). Its platform independence stems from compiling code into bytecode, facilitating portability. Java features automatic memory management, reducing concerns about memory leaks, and supports multi-threading, aiding concurrent programming. Known for its rich standard library, Java simplifies common tasks and offers robust security features, including secure execution environments. Widely used in web development, enterprise systems, and Android app development, Java has a vast ecosystem of libraries and frameworks. With an active community and continuous updates, Java remains a versatile and enduring language in the software development landscape.
Output:
Most repeated word: java
Frequency: 7
Approach 2: To Find All Repeated Words In the Text File And Their Occurrences
Step 1: Create a HashMap named wordCountMap to store words and their occurrences. Initialize a BufferedReader named reader to read from a file (C:\\file.txt).
Step 2: Read the first line of the file into the currentLine variable.
While the currentLine is not null:
- Convert the currentLine to lowercase.
- Split the currentLine into an array of words using space as the delimiter.
- Iterate through each word:
If the word is already present in wordCountMap, update its count.
If the word is absent, insert it into the wordCountMap with a count of 1.
- Read the next line into currentLine.
Step 3: Get all entries from wordCountMap as a set of key-value pairs. Create a list (List<Entry<String, Integer>>) from the entry set. Using a custom comparator, sort the list in decreasing order based on the values (word occurrences).
Step 4: Iterate through the sorted list:
If the word occurrence is greater than 1, print the word and its count.
Let us understand the following Java program
File name: Wordrepeat2.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Set;
public class Wordrepeat2
{
public static void main(String[] args)
{
//Creating wordCountMap which holds words as keys and their occurrences as values
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
BufferedReader reader = null;
try
{
//Creating BufferedReader object
reader = new BufferedReader(new FileReader("C:/Users/9194/Desktop/java/file.txt.txt"));
//Reading the first line into currentLine
String currentLine = reader.readLine();
while (currentLine != null)
{
//splitting the currentLine into words
String[] words = currentLine.toLowerCase().split(" ");
for (String word : words)
{
//if word is already present in wordCountMap, updating its count
if(wordCountMap.containsKey(word))
{
wordCountMap.put(word, wordCountMap.get(word)+1);
}
else
{
wordCountMap.put(word, 1);
}
}
currentLine = reader.readLine();
}
//Getting all the entries of wordCountMap in the form of Set
Set<Entry<String, Integer>> entrySet = wordCountMap.entrySet();
List<Entry<String, Integer>> list = new ArrayList<Entry<String,Integer>>(entrySet);
//Sorting the list in the decreasing order of values
Collections.sort(list, new Comparator<Entry<String, Integer>>()
{
@Override
public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2)
{
return (e2.getValue().compareTo(e1.getValue()));
}
});
//Printing the repeated words in input file along with their occurrences
System.out.println("Repeated Words In Input File Are :");
for (Entry<String, Integer> entry : list)
{
if (entry.getValue() > 1)
{
System.out.println(entry.getKey() + " : "+ entry.getValue());
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
reader.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
}
file.txt:
Java is a high-level, object-oriented programming language designed for cross-platform compatibility. Developed by Sun Microsystems and released in 1995, Java allows developers to write code that can run on any device with a Java Virtual Machine (JVM). Its platform independence stems from compiling code into bytecode, facilitating portability. Java features automatic memory management, reducing concerns about memory leaks, and supports multi-threading, aiding concurrent programming. Known for its rich standard library, Java simplifies common tasks and offers robust security features, including secure execution environments. Widely used in web development, enterprise systems, and Android app development, Java has a vast ecosystem of libraries and frameworks. With an active community and continuous updates, Java remains a versatile and enduring language in the software development landscape.
Output:
Repeated Words In Input File Are :
java : 7
and : 7
a : 4
in : 3
language : 2
its : 2
code : 2
memory : 2
for : 2
development, : 2
with : 2