How to validate PO Box Address Variations With Java Regular Expressions?

Address validation in Java could be a little tricky thing if you are unwilling to use Java Regular Expression. This is because of complexity caused by the variations of the components of the address that people can use.

In this tutorial, we will focus on doing the PO Box validation in an address using regular expressions in Java. The tutorial contains test cases for variations of the PO Box formats.

package com.kushal.tools;
/**
 * @author Kushal Pauduyal
 * Created on: 05/11/2011
 * Last Modified On: 05/11/2011
 * 
 * This class contains method and unit tests to validate the PO Box in an address.
 * The regular expression can validate several different variations of PO Boxes.
 * This should pretty much satisfy all international PO Box numbers, if not the 
 * pattern can be modified to satisfy one.
 */
import java.util.regex.Pattern;

public class RegexPOBoxValidation {

	/**
	 * Patterns are greatest things in Java. The following pattern can identify the Post Box
	 */
	static final String thePattern = 
"([\w\s*\W]*(P(OST)?.?\s*((O(FF(ICE)?)?)?.?\s*(B(IN|OX|.?))|B(IN|OX))+))[\w\s*\W]*";	
	
	/**
	 * 
	 * @param str - PO Box string to validate
	 * @return if the input string is a valid PO Box format.
	 */
	public static boolean isValid(String str) {
		return Pattern.matches(thePattern, str);
	}

	public static void main (String [] args ) {
		/**
		 * Doing unit test with several different PO Box usage alterations.
		 */
		String [] itemsToValidate = {
				"PO Box",
				"P O Box",
				"P. O. Box",
				"P.O.Box",
				"Post Box",
				"Post Office Box",
				"Post Office",
				"P.O.B",
				"P.O.B.",
				"POB",
				"Post Office Bin",
				"Box",
				"Bin",
				"Post",
				"Postal Code",
				"100,, P O Box Des Moines",
				" P O Box DesMoines1000",
				" P O Box Des Moines 1000",
				" Post Office Box",
				" Post Office Box  ",
				"Post Box #"};										 

		for (int index = 0; index < itemsToValidate.length; index++) {
			String item = itemsToValidate[index];
			boolean isValid = isValid(itemsToValidate[index].toUpperCase());
			System.out.println(item + " : " + (isValid ? "Valid" : "Invalid"));
		}
	}
			
}


Here is the output of this java program:

PO Box : Valid
P O Box : Valid
P. O. Box : Valid
P.O.Box : Valid
Post Box : Valid
Post Office Box : Valid
Post Office : Invalid
P.O.B : Valid
P.O.B. : Valid
POB : Valid
Post Office Bin : Valid
Box : Invalid
Bin : Invalid
Post : Invalid
Postal Code : Invalid
100,, P O Box Des Moines : Valid
 P O Box DesMoines1000 : Valid
 P O Box Des Moines 1000 : Valid
 Post Office Box : Valid
 Post Office Box   : Valid
Post Box # : Valid

Replacing non-ASCII characters using Java Regular Expressions

The following java code shows how we can replace non-ASCII characters from a file using regular expressions.

package com.kushal.utils;

/**
 * ReplaceNonASCIICharacters.java
 * @author Kushal Paudyal
 * www.sanjaal.com/java
 *
 * This class reads a file with non ASCII Characters in it.
 * Replaces the non ASCII Characters using regular expression.
 * Saves the content with non-ASCII Characters removed to a new file.
 */

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;

public class ReplaceNonAsciiCharacters {

	public static void main(String args[]) {

		/**
		 * This is the input file name with some non-ASCII characters in the
		 * content of the file.
		 */
		String fileName = "C:/Temp/WithNonASCIICharacters.txt";
		/**
		 * This is the location of the output file - the content of this file
		 * will be the input file content minus the non-ASCII characters.
		 */
		String outputFileName = "C:/Temp/WithNonASCIICharactersRemoved.txt";

		try {

			/**
			 * Create a reader to read the input file
			 */
			BufferedReader in = new BufferedReader(new FileReader(fileName));
			String line = "";

			String formattedStr = "";
			int count = 0;
			/**
			 * Iterate through each line of content
			 * remove any non-ASCII characters with blank using
			 * regular expression.
			 *
			 * Append the new line character properly.
			 */
			while ((line = in.readLine()) != null) {
				if (count == 0)
					formattedStr += line.replaceAll("[^\p{ASCII}]", "");
				else
					formattedStr += "n" + line.replaceAll("[^\p{ASCII}]", "");

				count++;
			}

			/**
			 * Write the content to the output file using BufferedWriter object.
			 */
			BufferedWriter out = new BufferedWriter(new FileWriter(
					outputFileName));
			out.write(formattedStr);

			/**
			 * Once done, flush the writer and close it.
			 */
			out.flush();
			out.close();

		} catch (Exception e) {
			e.printStackTrace();
		}

	}

	/*
	 * SANJAAL CORPS MAKES NO REPRESENTATIONS OR WARRANTIES ABOUT THE
	 * SUITABILITY OF THE SOFTWARE, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
	 * LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
	 * PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SANJAAL CORPS SHALL NOT BE
	 * LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE AS A RESULT OF USING,
	 * MODIFYING OR DISTRIBUTING THIS SOFTWARE OR ITS DERIVATIVES.
	 *
	 * THIS SOFTWARE IS NOT DESIGNED OR INTENDED FOR USE OR RESALE AS ON-LINE
	 * CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE
	 * PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT
	 * NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE
	 * SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF THE
	 * SOFTWARE COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE
	 * PHYSICAL OR ENVIRONMENTAL DAMAGE ("HIGH RISK ACTIVITIES"). SANJAAL CORPS
	 * SPECIFICALLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR
	 * HIGH RISK ACTIVITIES.
	 */
}

US 5 and 9 Digits Zip Code Validation In Java Using Regular Expressions

ZIP codes are a system of postal codes used by the United States Postal Service (USPS) since 1963. The term ZIP, an acronym for Zone Improvement Plan, is properly written in capital letters and was chosen to suggest that the mail travels more efficiently, and therefore more quickly, when senders use the code in the postal address. The basic format consists of five decimal numerical digits. An extended ZIP+4 code, introduced in the 1980s, includes the five digits of the ZIP code, a hyphen, and four more digits that determine a more precise location than the ZIP code alone. The term ZIP code was originally registered as a servicemark (a type of trademark) by the U.S. Postal Service, but its registration has since expired. [Wikipedia / Creative Commons]

The following java class uses the regular expressions to validate the zip code. It also has tests for various sample valid and invalid zip codes.

package com.kushal.tools;
/**
 * @author Kushal Paudyal
 * Last Modified on 2010/12/12
 * www.sanjaal.com/java
 * This class demonstrates the use of regular expressions to validate US Zip code.
 * It basically validates two rules:
 * - Zip code should have five digits initially.
 * - The last four digits along with the hyphen sign are optional and can be present or absent from the zip code.
 */
import java.util.regex.Pattern;

public class ZipcodeValidatorUS {
	/**
	 * Regular Expression to match the US Zip-Code
	 */
	static final String regex = "^\d{5}(-\d{4})?$";

	/**
	 * Testing the zip code validation with some sample zip codes
	 */
	public static void main(String args[]) {

		String mixedZips[] = { 	"50266-234A", "50266-2342", "5026A-2344",
								"5026A-234A", "50266", "230" };

		int index = 0;
		boolean isMatch = false;
		while (index < mixedZips.length) {
			isMatch = isAValidZipCode(mixedZips[index]);
			System.out.println("Zip " + mixedZips[index] + " - "
					+ (isMatch ? "Valid" : "Invalid"));
			index++;
		}

	}
	
	/**
	 * This method returns true if the parameter string is a valid zip code
	 */
	public static boolean isAValidZipCode(String zip) {
		return Pattern.matches(regex, zip);
	}

}

The following is the output of this program:

Zip 50266-234A - Invalid
Zip 50266-2342 - Valid
Zip 5026A-2344 - Invalid
Zip 5026A-234A - Invalid
Zip 50266 - Valid
Zip 230 - Invalid