String split() Java

June 7, 2013 | String and StringBuffer

Java designers noticed the importance of split() method in a programming language (which already exists in other languages like JavaScript etc.) and introduced in Java from JDK 1.4. The same splitting of a string into independent words or tokens is done in earlier versions using StringTokenizer. Usage of split() in place of StringTokenizer is very easy.

Following example uses String split() in various combinations.
import java.util.Arrays;
public class Demo
{
  public static void main(String args[])
  {     	
    String str1 = "abc def ghi";    // string with single empty space
    String str1Array[] = str1.split(" ");  // prints [abc, def, ghi] 		
    System.out.println("String with single empty space: " + Arrays.toString(str1Array));        

    String str2 = "abc def    ghi"; // string with extra spaces of 4 between def and ghi
    String str2Array[] = str2.split(" ");  // prints [abc, def,  ,  ,  ,  ghi]
    System.out.println("String with 4 spaces between def and ghi: " + Arrays.toString(str2Array));        

			            // to get correct output with extra empty spaces

    String str11 = "abc def    ghi";// string with extra spaces of 4 between def and ghi
    String str11Array[] = str11.split("\\s+"); // prints [abc, def, ghi]
    System.out.println("String with 4 spaces between def and ghi: " + Arrays.toString(str11Array));       

    String str3 = "abc:def:ghi";    
    String str3Array[] = str3.split(":");  // prints [abc, def, ghi]
    System.out.println("String with : delimiter: " + Arrays.toString(str3Array));       

				    // SPLIT WITH SPECIAL CHARACTERS (SHOULD BE CAREFUL)
    String str4 = "abc.def.ghi";    // . is a special character  
    String str4Array[] = str4.split(".");  // OBSERVE prints [ ]
    System.out.println("String with . delimiter: " + Arrays.toString(str4Array));      
				    // should be coded as                                                                                          
    String str5 = "abc.def.ghi";    
    String str5Array[] = str5.split("\."); // prints [abc, def, ghi]                            
    System.out.println("String with . delimiter: " + Arrays.toString(str5Array));      

    String str6 = "abc|def|ghi";            // | is a special character
    String str6Array[] = str6.split("|");   // OBSERVE prints [, a, b, c, |, d, e, f, |, g, h, i,]                                                 
                                  
    System.out.println("String with | delimiter: " + Arrays.toString(str6Array));      
				            // should be coded as          
    String str7 = "abc|def|ghi";    
    String str7Array[] = str7.split("\\|"); // prints [abc, def, ghi]                           
    System.out.println("String with | delimiter: " + Arrays.toString(str7Array));      

				            // string with letters enclosed within { and }
    String str8 = "{abc}{def}{ghi}";    
    String str8Array[] = str8.split("[{}]");// prints [, abc,  , def,  , ghi]   // { and } are a special characters
    System.out.println("String with {} delimiter: " + Arrays.toString(str8Array));      

			          // a long string split into groups of 4 characters
    String str9 = "abcdefghijklmnopqrstu";    
    String str9Array[] = str9.split("(?<=\\G.{4})"); // prints [abcd, efgh, ijkl, mnop, qrst, u]                          
    System.out.println("String with group of 4 characters: " + Arrays.toString(str9Array));      

				// a split including a special character
    String str10 = "abc*def*ghi*";    
    String str10Array[] = str10.split("(?<=[*])"); // prints [abc*, def*, ghi*]                          
    System.out.println("String including special character: " + Arrays.toString(str10Array));     
  }
}

String split() Output screen on String split() Java

Before going for explanation of the above code, first let us see the signature of the split() method as defined in String class from JDK 1.4.

The split() method overloaded two times.

public String[] split(String regex): Splits this string around matches of the given regular expression. This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

public String[] split(String regex, int limit): Splits this string around matches of the given regular expression.

The split() method takes a regular expression as parameter.

Arrays.toString(str1Array)

The split() method returns a string array. To print the elements of string array, the easiest way to use toString() method of java.util.Array class.

// to get correct output with extra empty spaces

String str11 = "abc def ghi"; // string with extra spaces of 4 between def and ghi String str11Array[] = str11.split("\\s+"); // prints [abc, def, ghi]

To ignore the extra white spaces (other than one), the regular expression \\s+ is used (remember, split() takes a regular expression as parameter).

// should be coded as String str5 = "abc.def.ghi"; // . is a special character String str5Array[] = str5.split("\\."); // prints [abc, def, ghi]

String str7 = "abc|def|ghi"; // | is a special character String str7Array[] = str7.split("\\|"); // prints [abc, def, ghi]

With the special characters existing in the string, a little care should be taken (special characters and keywords give a special meaning to the compiler). The special character . (dot) should be preceded by a single backward \. But again backward \ is a special character write two backslahes as \\. Infact, \\ is an escape sequence.

The braces { and } are special also special characters and should be written as follows.

// string with letters enclosed within { and } String str8 = "{abc}{def}{ghi}"; String str8Array[] = str8.split("[{}]"); // prints [, abc, , def, , ghi] // { and } are a special characters

If you have long string and is required to split a group of characters, use the regular expression as follows.

// a long string split into groups of 4 characters String str9 = "abcdefghijklmnopqrstu"; String str9Array[] = str9.split("(?<=\\G.{4})"); // prints [abcd, efgh, ijkl, mnop, qrst, u]

In the above code the string is split with a group of 4 characters.

// a split including a special character String str10 = "abc*def*ghi*"; String str10Array[] = str10.split("(?<=[*])"); // prints [abc*, def*, ghi*]

In the above code, in the output tokens, the delimiter * is also included. Actually, the delimiter is not printed in the output.