How To Read DOC file Using Java and Apache POI

One of the visitors of my blog asked me write how to read a document file using Java. I wrote the following program to demonstrate how Apache POI can be used for this purpose.

I have used the following API to write this program. If you have downloaded the Apache POI, you should fine this jar file within the bundle.

  • poi-scratchpad-3.2-FINAL-20081019.jar

The tutorial demonstrates the following features:

–How to read a simple Microsoft word document file using Java and Apache POI (.docx not supported)
–This includes the ability to read total number of paragraph and the paragraph content
–How to read the document headers
–How to read the document footers
–How to read the document summary

Apache POI is not robust yet. It has a long way to go through to handle complex document formats. Moreover I figured out that from one version to another, the classes are moving from one package to another. So if you are using the older/newer version of POI, in case of any compilation error for imports, try finding the classes in some other packages.

/**
 * @author Kushal Paudyal
 * www.sanjaal.com/java
 * Last Modified On: 03/23/2009
 */
package com.kushal.utils;

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hpsf.DocumentSummaryInformation;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import org.apache.poi.hwpf.usermodel.HeaderStories;

import java.io.*;

public class ReadDocFileFromJava {

	public static void main(String[] args) {
		/**This is the document that you want to read using Java.**/
		String fileName = "C:\Documents and Settings\kushalp\Desktop\Test.doc";

		/**Method call to read the document (demonstrate some useage of POI)**/
		readMyDocument(fileName);

	}
	public static void readMyDocument(String fileName){
		POIFSFileSystem fs = null;
		try {
			fs = new POIFSFileSystem(new FileInputStream(fileName));
			HWPFDocument doc = new HWPFDocument(fs);

			/** Read the content **/
			readParagraphs(doc);

			int pageNumber=1;

			/** We will try reading the header for page 1**/
			readHeader(doc, pageNumber);

			/** Let's try reading the footer for page 1**/
			readFooter(doc, pageNumber);

			/** Read the document summary**/
			readDocumentSummary(doc);

		} catch (Exception e) {
			e.printStackTrace();
		}
	}	

	public static void readParagraphs(HWPFDocument doc) throws Exception{
		WordExtractor we = new WordExtractor(doc);

		/**Get the total number of paragraphs**/
		String[] paragraphs = we.getParagraphText();
		System.out.println("Total Paragraphs: "+paragraphs.length);

		for (int i = 0; i < paragraphs.length; i++) {

			System.out.println("Length of paragraph "+(i +1)+": "+ paragraphs[i].length());
			System.out.println(paragraphs[i].toString());

		}

	}

	public static void readHeader(HWPFDocument doc, int pageNumber){
		HeaderStories headerStore = new HeaderStories( doc);
		String header = headerStore.getHeader(pageNumber);
		System.out.println("Header Is: "+header);

	}

	public static void readFooter(HWPFDocument doc, int pageNumber){
		HeaderStories headerStore = new HeaderStories( doc);
		String footer = headerStore.getFooter(pageNumber);
		System.out.println("Footer Is: "+footer);

	}

	public static void readDocumentSummary(HWPFDocument doc) {
		DocumentSummaryInformation summaryInfo=doc.getDocumentSummaryInformation();
		String category = summaryInfo.getCategory();
		String company = summaryInfo.getCompany();
		int lineCount=summaryInfo.getLineCount();
		int sectionCount=summaryInfo.getSectionCount();
		int slideCount=summaryInfo.getSlideCount();

		System.out.println("---------------------------");
		System.out.println("Category: "+category);
		System.out.println("Company: "+company);
		System.out.println("Line Count: "+lineCount);
		System.out.println("Section Count: "+sectionCount);
		System.out.println("Slide Count: "+slideCount);

	}

}

  1. How to read / write UTF8 and Non-UTF8 files in Java
  2. How To Print A Text File In Java
  3. How to read a tab separated or tab delimited file in Java program and print the content to console
  4. Computing the total, free and usable disk space easily using JDK 1.6
  5. Reading the Resource or Property files in Java as a file stream and outputting the content to console
  6. File Copy From Local Folder or Network Folder Using Java With Capability To Limit The Number
  7. Java Tutorial – Using JCIFS to copy files to shared network drive using username and password
  8. Reading / Writing File in Java and String Manipulation
  9. How To Read DOC file Using Java and Apache POI
  10. Writing To Excel File Using Apache POI
  11. Reading Excel File Using Java And Apache POI
  12. Calculating Folder Size In Java
  13. How To Rename A File In Java
  14. Calculating Folder Size Graphically and Generating Directory Size Chart In Java
Tagged , , , , , , , . Bookmark the permalink.

5 Responses to Calculating Folder Size Graphically and Generating Directory Size Chart In Java

  1. nocdib says:

    Thank you for this! It is just what I need

  2. d says:

    would that help you tocreate the file from ubunt server

  3. komal gaikwad says:

    in this code i got problem of array out of bound plz tell me solution.

  4. komal gaikwad says:

    In this code i got error
    Exception in thread “main” java.lang.ArrayIndexOutOfBoundsException: 1
    plz send me proper code….

  5. Neethu says:

    Thank you very much for the solution. This is the one which i was searching.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.