StringTokenizer Tokenizing String

March 28, 2011 | java.util

After knowing the data structures from java.util package, let us discuss about the remaining classes of java.util package. The classes include Date and GregorianCalendar to manipulate dates, Random class to generate random numbers and StringTokenizer to separate the string into individual words (known as tokens). The other classes less used are Observer, Observable, Timer and TimerTask etc. Now let us go through these classes one after another.

class StringTokenizer

We know earlier, StreamTokenizer from java.io package which tokenizes a stream. Now let us discuss a similar class StringTokenizer that tokenizes a string. It is from java.util package. Each word in a file/string is called as a token. The StringTokenizer class tokenizes the string either on the basis of default delimiter, whitespace, or the delimiters supplied by the programmer.

Following is the class signature as defined in java.util pakage.

public class StringTokenizer extends Object implements Enumeration

Following program on "StringTokenizer Tokenizing String" uses default whitespace as delimiter. The next program tokenizes using our own delimiters.

import java.util.StringTokenizer;
public class STDemo1
{
 public static void main(String args[])
 {
   String str1 = "Different states, different cultures, but India is one";
   StringTokenizer st1 = new StringTokenizer(str1);
   System.out.println("Number of tokens in str1: " + st1.countTokens());
   
   while(st1.hasMoreTokens())
   {
     System.out.println(st1.nextToken());
   }
 }
}

StringTokenizer Tokenizing String

StringTokenizer st1 = new StringTokenizer(str1);

The string to be tokenized, str1, is passed as parameter to the StringTokenizer constructor. The countTokens() method of StringTokenizer returns the number of tokens (words) available in the string str1. The StringTokenizer tokenizes taking, by default, whitespace as delimiter.

while(st1.hasMoreTokens()) { System.out.println(st1.nextToken()); }

To print the tokens available with StringTokenizer object, without knowing how many tokens exist, the designers followed a simple programming technique; the same technique followed with java.util.Enumeration and java.sql.ResultSet . The hasMoreElements() method returns a boolean value of true as long as tokens exist with the StringTokenizer object to return. If all the tokens are returned, the method returns false and the loop terminates. The nextToken() returns each token one-by-one for each iteration. The technique involves the usage of two methods; one that returns a boolean value used to iterate and the other that returns the value.

The following is another program on "StringTokenizer Tokenizing String" that takes a group of delimiters supplied explicitly.

import java.util.StringTokenizer;
public class STDemo2
{
  public static void main(String args[])
  {
    String str1 = "Awake@arise*stop/notmuntil@goal*reach";
    StringTokenizer st1 = new StringTokenizer(str1, "@,m,*,/");
    System.out.println("Number of tokens in str1: " + st1.countTokens());
    
    while(st1.hasMoreTokens())
    {
      System.out.println(st1.nextToken());
    }
  }
}

StringTokenizer Tokenizing String

String str1 = "Awake@arise*stop/notmuntil@goal*reach"; StringTokenizer st1 = new StringTokenizer(str1, "@,m,*,/");

The delimiters in the string str1 are @, m, * and / and these are supplied as second parameter in the constructor. Now, the StringTokenizer tokenizes as per these delimiters supplied.

A similar class exist that tokenizes a stream of data - StreamTokenizer – Tokenizing a Stream.