How to get words from a sentence in java / How to count number of words in a sentence in java

Words in a sentence are separated by a space. All the methods to get the words of a sentence use this identification. Below are shown a couple of methods to get the words of a sentence entered by the user.

Method 1 : Using split method of java.lang.String

String class has a split()method which divides a string on the basis of the characters passed to this method and returns an array of string as a result of splitting. For Example, if split is called on string “codippa.com” and the argument passed is a “.” (dot), then the returned array will contain “codippa” and “com”. Note that the delimiter is not included in the returned string array.

In the below example, read a String from the user as input and call split()method on it with a space as its argument. Iterate over the array returned using a forloop. The length of this array will give the total number of words in the entered string.

static void usingStringSplit(){
	Scanner scanner = new Scanner(System.in);
	System.out.println("Enter the sentence");
	// input the sentence from the user
	String sentence = scanner.nextLine();
	// split over space. Now we have array of words
	String[] words = sentence.split(" ");
	System.out.println("Words in the given sentence are :");
	// iterate over the words
	for (String word : words) {
		System.out.println(word);
	}
	System.out.println("--------------------------------");
	//length of the array is the word count
	System.out.println("Word count is :: " + words.length);
	scanner.close();
}

Output :

Enter the sentence
Welcome to codippa.com
Words in the given sentence are :
Welcome
to
codippa.com
——————————–
Word count is :: 3

Method 2 : Using java.util.StringTokenizer

java.util.StringTokenizerhas a constructor which takes a String and a delimiter on the basis of which it splits the string. The resultant sequence of characters are called tokens. For Example, if the string is “codippa.com” and the delimiter is a “.” (dot), then the tokens will be “codippa” and “com”. Note that the delimiter is not included in the tokens.

In the below example, read a String from the user as input and pass it to the constructor of java.util.StringTokenizerwith a space as delimiter. Iterate over the tokens returned using its nextToken()method. Tokens are retrieved till hasMoreTokens()method of StringTokenizer returns true. To get the total number of tokens returned by the StringTokenizer, its countTokens()method may be used.

static void usingStringTokenizer(){
        Scanner scanner = new Scanner(System.in);
        System.out.println("Enter the sentence");
        // input the sentence from the user
        String sentence = scanner.nextLine();
        // split over space. Now we have array of words
        StringTokenizer tokenizer = new StringTokenizer(sentence, " ");
        int numberOfWords = tokenizer.countTokens();
        while(tokenizer.hasMoreTokens()){
            System.out.println(tokenizer.nextToken());
        }
        System.out.println("--------------------------------");
        //length of the array is the word count
        System.out.println("Word count is :: " + numberOfWords);
        scanner.close();
    }

Output :

Enter the sentence
Welcome to codippa.com
Words in the given sentence are :
Welcome
to
codippa.com
——————————–
Word count is :: 3

Let’s tweak in:

  1. java.util.StringTokenizerhas a constructor which takes a String as an argument. It uses the " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character as default delimiters and splits the supplied string on these characters.
  2. hasMoreElements()and nextElement()methods of java.util.StringTokenizermay also be used in place of hasMoreTokens()and nextToken()methods.
  3. countTokens()method should be called before iterating over the tokens returned by each call to nextToken(). If called after iteration, it will return 0.
  4. split()method of java.lang.Stringtakes a regular expression and hence can also be used to split a string on numbers and various other patterns.

Have some more methods ? Don’t hesitate to write to us or add your comments.

Leave a Reply