How to extract plain Text from HTML Website easily in Java

I was looking for ways to crawl websites, and to be able to only extract text. The reason I was trying to do this was to get the text from various websites to prepare Text Corpus for Natural Language Processing for a Nepali Language. There were several solutions on the internet, but nothing could be as simple as this one. I wrote this using a JSoup Library. In the example below, I have extracted text from the entire body, but if you want you can extract text for a desired node (and children) easily.

/**
 * 
 * @author Kushal Paudyal
 * Created on: 3/9/2017
 * Last Modified on: 3/9/2017
 *
 */
package com.icodejava.research.nlp;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class HtmlTextExtractor {
	
	public static void main (String args []) throws IOException {
		Document doc = Jsoup.connect("http://swasthyakhabar.com/news-details/3356/2017-03-09").get();
		
		System.out.println(doc.body().text());
	}

}

Recursively Finding Greatest Common Divisor (GCD) – Java Implementation

This articles shows a java program that finds a greatest common divisor between two input integers and also prints the intermediate steps of recursion. Can you think about the complexity of this algorithm? It is O(Log(N)) where N is the largest number of the two. See the analysis here.

package com.icodejava.research.ready;
/**
 * 
 * @author Kushal Paudyal
 * Created on: 2/8/2017
 * Last Modified on: 2/8/2017
 * 
 * Recursive way of finding the GCD (Greatest Common Divisor)
 */
public class GCDRecursive {
	
	 public static int gcd(int a, int b) {
		 System.out.println("First Number: " + a + " Second Number: " + b);
	        if (a < 0 || b < 0) {
	            throw new IllegalArgumentException("No GCD of negative integers");
	        }
	        return b == 0 ? a : gcd(b, a % b);
	    }
	 
	 public static void main(String args []) {
		 System.out.println("Greatest Common Divisor of 35 and 21 is " + gcd(35,21));
	 }

}

The following is the output of this program.

First Number: 35 Second Number: 21
First Number: 21 Second Number: 14
First Number: 14 Second Number: 7
First Number: 7 Second Number: 0
Greatest Common Divisor of 35 and 21 is 7

Implementing a Simple LIFO Stack in Java using LinkedList

Last In First Out (LIFO) stacks can be easily implemented in Java using a singly linked list. This example shows a very simple implementation of the stack operations using singly linked list.

package com.icodejava.blog.published.datastructure;
/**
 * 
 * @author Kushal Paudyal
 * Created on: 2/8/2017
 * Last Modified on: 2/8/2017
 *
 * This class shows a simple implementation of Stack using LinkedList
 */
import java.util.LinkedList;

public class StackUsingLinkedList {

	private LinkedList<Object> list = new LinkedList<Object>();

	public void push(Object item) {
		list.addFirst(item);
		System.out.println("Stacked: " + item);
	}

	public Object pop() {
		System.out.println("Destacked: " + list.getFirst());
		return list.removeFirst();
	}

	public Object peek() {
		return list.getFirst();
	}

	public int size() {
		return list.size();
	}

	public boolean isEmpty() {
		return list.isEmpty();
	}
	
	/**
	 * Testing the stack. The added objects should be returned in reverse order
	 */
	public static void main (String args[]) {
		StackUsingLinkedList stack = new StackUsingLinkedList();
		System.out.println("===STATCK-PUSH===");
		stack.push("One");
		stack.push("Two");
		stack.push("Three");
		
		System.out.println("\n===STACK-POP===");
		while (!stack.isEmpty()) {
			stack.pop();
		}
	}
}

Here is the output of this program:

===STATCK-PUSH===
Stacked: One
Stacked: Two
Stacked: Three

===STACK-POP===
Destacked: Three
Destacked: Two
Destacked: One