How to count number of words in a file using java / Various ways to count the number of words in a file using java

Determining the number of words in a file is an important aspect of programming. It is required while solving programming problems and it is also utilized in real world applications. This post shall detail out different methods to count the number of words in a file.
All of the methods involve the logic to read a file, thus you should know how to read a text file in java. This post will definitely help you out on that.

Method 1: Using BufferedReader
Read a file line by line using java.io.BuffereredReader. Break this line into words by splitting over it using split method of java.lang.String. This method takes a string on which it splits the string and returns an array of strings after splitting. Thus, if a string “abcdbe” is split on “b”, then the resulting array will be {“a”, “cd”, “e”}. Thus, to get an array of words in a line, the string is split over a space, since words in a line are separated by space. Now count the length of the array of words of each line and add it to a variable. When complete file has been read, this variable will contain the total number of words in the file.

public class WordCounter {

   public static void main(String[] args) throws IOException {
      usingBufferedReader();
   }

   private static void usingBufferedReader() throws IOException {
	String filePath = "testfile.txt";
	// initialize readers for file
	FileReader fr = new FileReader(filePath);
	BufferedReader br = new BufferedReader(fr);
	String line = null;
	// variable for holding word count
	int wordCount = 0;
	// read file line by line
	while ((line = br.readLine()) != null) {
		// get array of words in current line
		String[] words = line.split(" ");
		wordCount += words.length;
	}
        // close file reader
	br.close();
	System.out.println("Number of words in the file is: " + wordCount);
   }
}

Method 2: Using Scanner class
This method is same as above, the only difference is that it uses a java.util.Scanner instead of java.io.BufferedReader to read the file. However, this method is better since it does not involve any splitting of strings since a java.util.Scanner can itself loop over words in a line.

public class WordCounter {

   public static void main(String[] args) throws IOException {
      usingScanner();
   }

   private static void usingScanner() throws IOException {
	String filePath = "testfile.txt";
	// initialize scanner for reading file
	Scanner fileReader = new Scanner(new FileInputStream(filePath));
	// variable for holding word count
	int wordCount = 0;
	// iterate over lines
	while(fileReader.hasNext()){
	   // read a line word by word
	   fileReader.next();
	   wordCount++;
	}
	// close scanner
	fileReader.close();
	System.out.println("Number of words in the file is: " + wordCount);
   }
}

Method 3: Using Apache Library
This method uses initials method of org.apache.commons.lang.WordUtils class from Apache Commons Lang library. This method takes a string as argument and returns the first letter of each word in it. Internally, it treats space as the separator between the letters and any letters separated by a space are treated as different words. Thus, for a string “this is a test” supplied to initials method, it will return “tiat” which is the first letter of every word. If we count the length of this string, it will be the number of words in the file.

import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.WordUtils;

public class WordCounter {

   public static void main(String[] args) throws IOException {
      usingWordUtils();
   }

   private static void usingWordUtils() throws IOException {
        String filePath = "testfile.txt";
	// read the content of file as string
	String fileContent = IOUtils.toString(new FileReader(filePath));
	// get initials of words in string
	String initials = WordUtils.initials(fileContent);
	System.out.println("Number of words in the file is: " + initials.length());
   }
}

Note that the above code uses toString method of org.apache.commons.io.IOUtils class from Apache Commons IO library to get the contents of file as a string. It is not necessary to use this library if the contents of file are available as a string. It is also possible to iterate over the file line by line and add it to a java.lang.StringBuffer and then supply it to initials method.

Method 4: Using Regular Expression
This method takes contents of the file and performs a split operation based on a pattern which matches the supplied regular expression. This regular expression is created to match white space characters only.
This method uses java.util.regex.Pattern class. Its compile method is used to create a pattern which takes a string which is the regular expression. Finally, split method is called on this pattern object. This method takes the string and splits it as per the regular expression and returns an array of resulting strings.

import org.apache.commons.io.IOUtils;
import java.util.regex.Pattern;

public class WordCounter {

   public static void main(String[] args) throws IOException {
      usingRegex();
   }

  private static void usingRegex() throws IOException{
	String filePath = "testfile.txt";
        // read the content of file as string
	String fileContent = IOUtils.toString(new FileReader(filePath));
	// create a pattern to match spaces
	Pattern pattern = Pattern.compile("\\s");
	// split the supplied string over the above regular expression
	String[] words = pattern.split(fileContent);
	System.out.println("Number of words in the file is: " + words.length);
   }
}

This method also usesĀ  toString method of org.apache.commons.io.IOUtils class from Apache Commons IO library to get the contents of file as a string. You can use other methods to get the file content as a string as well and it is not necessary to use this library.

Hope this post helped you in understanding the different approaches that can be applied to determine the total number of words in a file. Point out some other approach if you are familiar with, in the comments section below.
Keep visiting!!!

Leave a Reply