5 ways to remove duplicate characters from java string

Removing duplicate letters or characters from string is one of the most frequently appearing requirement among developers and a commonly asked programming problem in interviews.
There are different approaches of removing repeated characters or duplicates from a string in java.
Some of them will be discussed in this post.

Method 1 : Using a boolean array

public class DuplicateRemover {   
   public static void main(String[] args) {
  String stringWithDuplicates = "afsjeadrffafvgdefeverhfgberAAad";
  char[] characters = stringWithDuplicates.toCharArray();
  boolean[] found = new boolean[256];
  StringBuilder sb = new StringBuilder();
  System.out.println("String with duplicates : "
                     + stringWithDuplicates);
  for (char c : characters) {
     if (!found[c]) {
        found[c] = true;
        sb.append(c);
     }
  }
  System.out.println("String after duplicates removed : "
                     + sb.toString());
    }
}

Explanation

This approach utilizes the ASCII value of characters. The given string is converted to a character array which is iterated to determine duplicate characters.

In each iteration, the character is checked for existence in a boolean array.
This boolean array converts the character to its numeric equivalent and checks for the value at this numeric index.

If the character has not appeared previously, then the value at this position will be false(since boolean array will have all false values by default).
Now the value is set to true and the character is appended to the java.util.StringBuilderobject.
Next time when the same character appears the value at the index equal to its numeric value is found trueand nothing happens.
The boolean array is initialized to a capacity of 256 keeping in mind the ASCII values.

Output

String with duplicates : afsjeadrffafvgdefeverhfgberAAad
String after duplicates removed : afsjedrvghbA

Method 2: Using Set

This method utilized java.util.Setto detect duplicate characters.

public class DuplicateRemover {   
   public static void main(String[] args) {
     String stringWithDuplicates = "afsjeadrffafvgdefeverhfgberAAad";
     char[] characters = stringWithDuplicates.toCharArray();
     Set set=new HashSet();
     StringBuilder sb = new StringBuilder();
     System.out.println("String with duplicates : "
                        + stringWithDuplicates);
     for (char c : characters) {
        if (!set.contains(c)) {
          set.add(c);
          sb.append(c);
        }
     }
     System.out.println("String after duplicates removed : "
                        + sb.toString());
   }
}

Explanation

This method is slightly different from the previous one in that it uses a set to test for duplicate characters.
Since a set only accepts unique elements, hence in each iteration the existence of the character is checked in the set.
If it does not exist, it is added to the set and the java.lang.StringBuilderobject which holds the final string without duplicate characters.
If the character exists in the set, loop continues and next character is tested.

Output

String with duplicates : afsjeadrffafvgdefeverhfgberAAad
String after duplicates removed : afsjedrvghbA

Method 3: Using traditional loops

This approach does not involve any special technique or data structure. It simply relies on character to character comparisons.

public class DuplicateRemover {
   public static void main(String[] args) {
      String stringWithDuplicates = "afsjeadrffafvgdefeverhfgberAAad";
      char[] characters = stringWithDuplicates.toCharArray();
      int length = characters.length;
      System.out.println("String with duplicates : "
                         + stringWithDuplicates);
      for (int i = 0; i < length; i++) {
         for (int j = i + 1; j < length; j++) {
            if (characters[i] == characters[j]) {
               int temp = j;//set duplicate element index
               //delete the duplicate element by shifting the elements to left
               for (int k = temp; k < length - 1; k++) {
                 characters[k] = characters[k + 1];
               }
               j--;
               length--;//reduce char array length
            }
         }
      }
      String stringWithOutDuplicates = new String(characters);
      stringWithOutDuplicates = stringWithOutDuplicates.substring(0, length);
      System.out.println("String after duplicates removed : " +
                        stringWithOutDuplicates);
   }
}

Explanation

This method converts the given String to a character array which is then iterated.
There are two loops, the outer loop starts with first array index and an inner loop which starts with one index greater than the outer loop till the length of the array.

Basically, we are comparing each character to all the characters ahead it in the array.
When a match is found, we capture its index and start another loop which begins at the duplicate character index till the length of the loop.
In this third loop, we shift the array elements to the left so that the duplicate element is removed.

After this loop completes the duplicate element discovered in this iteration is removed and the length of the array is decreased by one.
This is done because when the elements are shifted to the left the elements at the end of the array are duplicated in each iteration of the third loop.

Finally after the loops complete, we have an array where all the duplicate elements are removed but there are some duplicate elements towards the end of the array.
The length of the array at this point contains the index of the last non-duplicate element.
Therefore, a new String is created out of this array and its substring()method is called to remove the trailing duplicate characters using the updated length.
Method 4: Using java 8 streams

Java 8 has introduced the concept of streams where an array can be represented as a sequence of elements and operations can be performed on those elements.

A new method chars() is added to java.lang.String class in java 8.
chars returns a stream of characters in the string. Invoking distinct method on this stream removes duplicate elements and returns another stream.
Use forEach() method of this stream to iterate over.
In every iteration, add the current character to a java.lang.StringBuffer.
After iteration completes, the buffer is converted to a string using toString() method.

public class DuplicateRemover {
 public class DuplicateRemover {
   public static void main(String[] args) {
      String stringWithDuplicates = "asdasdedsfrgdftg";
      StringBuffer sb = new StringBuffer(); 
      stringWithDuplicates.chars().distinct().forEach(letter->sb.append(letter));
      String duplicatesRemoved = sb.toString();
      System.out.println("String after duplicates removed : " +
                        duplicatesRemoved);
   }
 }
}

Note that the forEach method accepts a Lambda expression which adds the current character to a java.lang.StringBuffer.
Output of above program is

String with duplicates : asdasdedsfrgdftg
String after duplicates removed : asdefrgt

Method 5: Using indexOf()

This approach uses overloaded method indexOf() in java.lang.String class.
This method accepts two arguments:

a character,
numeric index

and returns the index of the character in the string after the index supplied as argument.
Thus, "rear".indexOf('r', 1) will return 3 since the character ‘r’ after index 1 is present at index 3.
indexOf() returns -1 if the character is not present in the string after the supplied index(second argument).
Using this approach, it can be figured out if a character occurs more than once or not by checking its index after its position.
Example,

public class DuplicateRemover {
   public static void main(String[] args) {
      String stringWithDuplicates = "asdasdedsfrgdftg";
      StringBuffer sb = new StringBuffer();
      for (int i = 0; i < s.length(); i++) {
        char character = stringWithDuplicates.charAt(i);
        // find index of character after the current position
        int index = stringWithDuplicates.indexOf(character, i + 1);
        if (index == -1) {
           sb.append((char) c);
        }
     }
     String duplicatesRemoved = sb.toString();
     System.out.println("String after duplicates removed : "
                        + duplicatesRemoved);
   }
}

Output of this program is

String with duplicates : asdasdedsfrgdftg
String after duplicates removed : aesrdftg

Note that the position of characters in the string with duplicate characters removed is different as compared to other methods.

Let’s tweak in

Method 1 and 2 have much better complexity as compared to Method 3.
Method 3 has a much higher complexity of O(n³) while Methods 1 and 2 have O(n) complexity.

There are numerous other methods to remove duplicate characters. Add more if you find any…

Categorized in:

Java String,

How to remove duplicate characters from a String in java

Method 1 : Using a boolean array

Explanation

Output

Method 2: Using Set

Explanation

Output

Method 3: Using traditional loops

Explanation

Method 5: Using indexOf()

Let’s tweak in

How to generate an array of random integers using Stream API Java 8

How to convert a character to String in java

Press ESC to close

Method 1 : Using a boolean array

Explanation

Output

Method 2: Using Set

Explanation

Output

Method 3: Using traditional loops

Explanation

Method 5: Using indexOf()

Let’s tweak in

How to generate an array of random integers using Stream API Java 8

How to convert a character to String in java

More in this CategoryJava String