Determining the number of words in a file is an important aspect of programming. It is required while solving programming problems and it is also utilized in real world applications. This post shall detail out different methods to count the number of words in a file.
All of the methods involve the logic to read a file, thus you should know how to read a text file in java. This post will definitely help you out on that.
Method 1: Using BufferedReader
Read a file line by line using java.io.BuffereredReader
. Break this line into words by splitting over it using split
method of java.lang.String
. This method takes a string on which it splits the string and returns an array of strings after splitting. Thus, if a string “abcdbe” is split on “b”, then the resulting array will be {“a”, “cd”, “e”}. Thus, to get an array of words in a line, the string is split over a space, since words in a line are separated by space. Now count the length of the array of words of each line and add it to a variable. When complete file has been read, this variable will contain the total number of words in the file.
public class WordCounter {
public static void main(String[] args) throws IOException {
usingBufferedReader();
}
private static void usingBufferedReader() throws IOException {
String filePath = "testfile.txt";
// initialize readers for file
FileReader fr = new FileReader(filePath);
BufferedReader br = new BufferedReader(fr);
String line = null;
// variable for holding word count
int wordCount = 0;
// read file line by line
while ((line = br.readLine()) != null) {
// get array of words in current line
String[] words = line.split(" ");
wordCount += words.length;
}
// close file reader
br.close();
System.out.println("Number of words in the file is: " + wordCount);
}
}
Method 2: Using Scanner class
This method is same as above, the only difference is that it uses a java.util.Scanner instead of java.io.BufferedReader
to read the file. However, this method is better since it does not involve any splitting of strings since a java.util.Scanner
can itself loop over words in a line.
public class WordCounter {
public static void main(String[] args) throws IOException {
usingScanner();
}
private static void usingScanner() throws IOException {
String filePath = "testfile.txt";
// initialize scanner for reading file
Scanner fileReader = new Scanner(new FileInputStream(filePath));
// variable for holding word count
int wordCount = 0;
// iterate over lines
while(fileReader.hasNext()){
// read a line word by word
fileReader.next();
wordCount++;
}
// close scanner
fileReader.close();
System.out.println("Number of words in the file is: " + wordCount);
}
}
Method 3: Using Apache Library
This method uses initials
method of org.apache.commons.lang.WordUtils
class from Apache Commons Lang library. This method takes a string as argument and returns the first letter of each word in it. Internally, it treats space as the separator between the letters and any letters separated by a space are treated as different words. Thus, for a string “this is a test” supplied to initials
method, it will return “tiat” which is the first letter of every word. If we count the length of this string, it will be the number of words in the file.
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.WordUtils;
public class WordCounter {
public static void main(String[] args) throws IOException {
usingWordUtils();
}
private static void usingWordUtils() throws IOException {
String filePath = "testfile.txt";
// read the content of file as string
String fileContent = IOUtils.toString(new FileReader(filePath));
// get initials of words in string
String initials = WordUtils.initials(fileContent);
System.out.println("Number of words in the file is: " + initials.length());
}
}
Note that the above code uses toString
method of org.apache.commons.io.IOUtils
class from Apache Commons IO library to get the contents of file as a string. It is not necessary to use this library if the contents of file are available as a string. It is also possible to iterate over the file line by line and add it to a java.lang.StringBuffer
and then supply it to initials method.
Method 4: Using Regular Expression
This method takes contents of the file and performs a split operation based on a pattern which matches the supplied regular expression. This regular expression is created to match white space characters only.
This method uses java.util.regex.Pattern
class. Its compile
method is used to create a pattern which takes a string which is the regular expression. Finally, split
method is called on this pattern object. This method takes the string and splits it as per the regular expression and returns an array of resulting strings.
import org.apache.commons.io.IOUtils;
import java.util.regex.Pattern;
public class WordCounter {
public static void main(String[] args) throws IOException {
usingRegex();
}
private static void usingRegex() throws IOException{
String filePath = "testfile.txt";
// read the content of file as string
String fileContent = IOUtils.toString(new FileReader(filePath));
// create a pattern to match spaces
Pattern pattern = Pattern.compile("\\s");
// split the supplied string over the above regular expression
String[] words = pattern.split(fileContent);
System.out.println("Number of words in the file is: " + words.length);
}
}
This method also uses toString
method of org.apache.commons.io.IOUtils
class from Apache Commons IO library to get the contents of file as a string. You can use other methods to get the file content as a string as well and it is not necessary to use this library.
Hope this post helped you in understanding the different approaches that can be applied to determine the total number of words in a file. Point out some other approach if you are familiar with, in the comments section below.
Keep visiting!!!