General
10 May 2012 2 Comments

An Introduction To Java Web Applications

Introduction

A Web application is an application that is accessed over a network such as the Internet or an intranet. While the earliest websites served only static web pages, dynamic response generation quickly became possible via CGI scripts, JSPs (JavaServer Pages), servlets, ASPs (Active Server Pages), server-side JavaScripts, PHP, or some other server-side technology.

Java has become a popular language for creating dynamic Web applications over the last 15 years, due to the introduction of servlets, JSP, and frameworks such as JSF and Spring. In this post we give an overview of these technologies, and explain the the major differences between them.

Building Blocks

For the sake of simplicity we distinguish three types of Web application building blocks: servlets, JSPs and frameworks

Servlets

In Java, Web applications consist of servlets. A servlet is a small Java program that runs within a Web server. Servlets receive and respond to requests from Web clients, usually across HTTP. An example of a servlet that takes a request and returns a page with the numbers 1 to 10 is given below.

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
 
public

Tags: ear, enterprise edition, framework, , jar, , , , jsp, servlet, standard edition, tomcat,
Programming
24 April 2012 0 Comments

Parsing Proteins in the GenBank/GenPept Flat File Format with BioJava 1.8.1

This post describes parsing annotated protein sequences from the RefSeq database. I was unable to find any complete examples for parsing RefSeq protein sequences in .gpff.gz files with Java, so here is a quick and dirty one.

The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. After downloading the latest release release from the FTP server, you end up with a lot of .gz files. An example of the filenames:

complete.1.1.genomic.fna.gz
complete.1.bna.gz
complete.1.genomic.gbff.gz
complete.10.bna.gz
complete.10.genomic.gbff.gz
complete.100.protein.gpff.gz

The README tells us that the filenames describe the type of information (genomic, protein, dna, rna). This information is split up in many (numbered) files. We are interested in protein information in the GenPept/GenBank Flat File format. Every file with protein information in this format has a name of the form complete..protein.gpff.gz.

Oh, and the regular expression for these filenames is:

^complete.[0-9]+.protein.gpff.gz$

Writing a parser…

Tags: biojava, biojava 1.8.1, dna, gbff, genbank, genbankformat, genpept, gpff, gzip, , parsing, protein, refseq, release, rna, sequence
Programming
18 April 2012 0 Comments

Fixing SAXParser Error “The system cannot find the file specified” for DTD files

When parsing an XML file with the SAXParser class, you may run into an error related to a .dtd file that cannot be found.

Example: We are parsing the file D:\homologene\build65\homologene.xml.

The first lines of the XML are:

 version="1.0"?>

>
  >
    >
      >3>

We see a DOCTYPE declaration that points to a DTD file. DTD stands for Document Type Definition, and it is used to define the format of the XML file. The SAXParser will automatically look for this file in the same directory as the XML file.

When parsing we get the following error:

java.io.FileNotFoundException: D:\homologene\build65\HomoloGene.dtd (The system cannot find the file specified)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.(Unknown Source)
  at java.io.FileInputStream.(Unknown Source)
  at sun.net.www.protocol.file.FileURLConnection.connect(Unknown Source)
  at sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown