This document looks at the issues related to developing the client side of a Web Service using Axis.
Axis supports SOAP, which is built on top of HTTP, a protocol built on TCP/IP. To understand what is going on, it is important to understand the levels underneath.
We are not going to explain TCP/IP in any detail, as it is far too complex. Some of the concepts and features of the technology are worth covering.
TCP/IP builds a reliable channel between two computers, hosts. Every computer running TCP can receive messages coming in on any port, from 1-65535. That is, if a program on that machine has created a socket and is listening on that port. If not, you will see the message connection refused.
Before a client connects to a host, it has to find its address. IPv4, the most widely supported version of TCP/IP uses 32 bit addresses, such as 127.0.0.1 (that's a special address, it means the local system). To connect to a host you need either the address, or a name of machine "www.w3.org" that can get mapped to an address. That mapping is provided by DNS, a hierarchical network that is so ubiquitous across the infrastructure that we usually take it for granted. Essentially, DNS servers take a hostname and return an IP address or an error message. How DNS servers work out the address is beyond the scope of this document; just assume that a local DNS server asks other ones if it thinks it needs to.
Not all systems have DNS support. A system can be configured to have its own host table; on Unix systems this lives in /etc/hosts, on windows in c:\windows\system32\drivers\etc\hosts. You can edit this file and bypass DNS completely, creating a maintenance nightmare in the process. If your users use static host tables rather than DNS, you can never change the network address of a host without serious grief.
Machines either have static addresses -the network managers assign them an address and they retain it over time, or dynamic addresses. The latter is common for client systems, especially laptops, desktop PCs and dial-up computers. Broadband computers often have static addresses, though it depends on the ISP.
After looking up the address, the client program creates a socket and tries to connect to the server. At this point the TCP protocol kicks in, the client initiating a conversation by sending a datagram to the destination. This commences the setup of the link, which takes a few packets (three). Once the connection is up and running, the server will be told of the caller (they can get its IP address), and they have a socket which is bound to the client for the duration of the connection. The client and server can send arbitrary binary data between each other, with the guarantees that (a) if the data arrives, it arrives in the order it was written to the socket, and (b) if it doesn't arrive, you will get an error message.
Some special things to know:
When communicating over a network, latency and bandwidth are the big constraints. Latency is how long it takes to communicate -all those bits of hardware like routers and firewalls add to latency, as does the network wiring. Bandwidth is the measure of how much stuff you can send per second. A link with low latency (good) may still have low bandwidth (bad), while a high bandwidth connection might have a high latency, as the remote server is distant.
Firewalls are a critical feature of modern networks. A firewall blocks off ports to incoming calls. A stateful firewall examines every packet and only allows packets of the current TCP conversation in, and is even better. Firewalls are essential for security reasons, so that you can expose more services behind a network than to the outside world. As well as restricting incoming calls, they often restrict outbound connections. For example, outbound connections to port 80 (HTTP) may be blocked to encourage use of an HTTP proxy server (see below), outbound connections to port 143 (IMAP) blocked to stop people connecting to external mail servers. You need to assume that there will be firewalls in the network, between SOAP client and server, and so callbacks from server to client are not going to be possible.
HTTP is at its heart, a very simple protocol. The client opens a TCP connection to a port on the remote system, usually port number 80. It then issues an HTTP request – a verb such as GET, POST, PUT or other standard request, a URL relative to the server and an HTTP version string. The client then sends zero or more headers – name:value pairs on individual lines, and then a blank line marking the end of headers. Those requests which involve data upload -such as POST- then continue with the upload of the data. Then the client waits for the server to respond.
The server can respond with an HTTP error code: a number, some headers of its own, and then usually the body of the request. Usually the headers include a MIME type declaration, and some others that are useful, such as "content-length" and "expires".
To get through firewalls, many organisations run a proxy server. This is a machine that has access to the outside network, while the rest of the Intranet does not. The clients must send a request to a proxy server, which then forward the request to the real server. A caching proxy server may cache request/response pairs, so popular requests do not consume bandwidth. This is very useful, but only works on requests that can be cached -historically only GET requests. Transparent Proxies are a special form of proxy -one that by virtue of the underlying network configuration route HTTP requests (especially those on port 80) through a proxy server -without any application configuration. Usually these are invisible until they go wrong, at which point your favourite web service appears as if is returning HTML destined for humans. This can lead to some interesting support calls.
SOAP over HTTP works above this underlying protocol. A SOAP request is a POST with an XML body; the response is an HTTP status code and ideally an XML message. As with normal HTTP, the 200 status code means all is well. The error code 500 may mean an internal server error, or it can indicate that the SOAP stack and/or service threw a SOAPFault. A SOAPFault is a standardised XML message that contains information the recipients can parse. Although other HTTP response codes may be sent back in some circumstances, the WS-I organisation sets down the rules as to when and how these are allowed.
Because SOAP usually runs on top of HTTP, all the classic HTTP techniques for authentication and session management (i.e. cookies) apply. Note that at some point in the future alternate transports may become more popular, in which case the HTTP techniques will cease to work. This is why so many people are writing SOAP-based replacements, usually built using SOAP headers. Axis has some prototypes of alternate transports in its codebase, though none are (yet) production ready.
The JAX-RPC specification is the base specification that client-side Axis is built upon. If you are writing a client, read it.
There are essentially two ways to use JAX-RPC to invoke a SOAP endpoint -a URL at a server that processes SOAP messages. First, you can use the javax.xml classes to build a SOAP call by hand, and invoke a remote server. This is ugly but gives you an idea of what is going on behind the scenes: an XML message is being built up that is then sent to the remote server, whose response is parsed and deconstructed. Client code written at this level should run against any JAX-RPC implementation.
The other way is to have Axis hide the details of the call and generate a wrapper class for a web service. This is done by taking the WSDL description of the service and generating Java classes that make the low level calls appropriate to building the SOAP requests for each operation, then post-processing the results into the declared return values. Axis also takes note of any URL of the service included in the WSDL and compiles this in to the classes. Thus the client will automatically bind to the URL that the WSDL talks about -which is often the URL of the (development) server that the WSDL was retrieved from.
This automatic generation of proxy classes is convenient, as it makes calling a remote Web Service look almost like calling a local object. However, it has some disadvantages that developers need to be aware of:
Based on personal experience, dynamically generating stub classes is very useful, as it simplifies client side code and helps the client source recognise when a service has changed is operations' signatures a way that is incompatible. If the parameters of an operation changes, the Java method's parameters change, and hence the application no longer builds.
However, it is absolutely critical to always remember: web services are not local objects. The proxy class may appear local, but the server can be a long way away, over a narrow connection.
Never make a blocking call to a Web Service from the GUI thread.
If you want to test an Axis service, Wsdl2Java can be told to create a stub JUnit test class, containing a test case for every single operation that the remote service implements. These stub test cases need to be filled in with valid test data, followed by the relevant assertions to validate the results.
The generated test cases can then be run from the IDE or from an Ant- or Maven-based build process.
When testing the client -or the test cases- experiment with the special failure modes that distributed applications can experience. Unplug the network connector at different points in the program. Set the service up to connect to an invalid URL on the same host, or to a host that doesn't exist. Try going through a proxy server. Try using a slow connection -the TCP Monitor program can simulate this for you.
There are also third party applications that help you test Web Services -by providing SOAP monitors and by giving you forms-based construction of SOAP requests. These are convenient to have, though you do have to pay for them.
Anteater, on sourceforge, is an Ant-based way of testing SOAP calls. You provide the payloads and then use xpath paths to verify the results. This may seem somewhat low-level but, it is very powerful.
TODO
Axis supports both client side JAX-RPC and Axis handlers. These handlers get called before a message is sent, and after it is received, just as for server-side handlers.
To redistribute an application running Axis, you need to redistribute
The Axis JAR is not signed, and so can not be used for auto-download from the Web Start facility in Java.
You do not currently need to include wsdl4j.jar, as the wsdl is not processed at run time. Note that this may change at some point in the future, as more knowledge about the structure of the SOAP message is needed to support doc/lit messages, which means limited runtime processing of WSDL files, or other metadata generated from the WSDL files during compilation.
When Axis generates client proxy classes code from WSDL, it binds the code to the endpoint URL specified in the WSDL -this is usually a URL generated from the URL of the inbound request. Using a http://localhost URL to fetch a WSDL page will result in client code also bound to a service served up on the localhost, which is not what you want in a redistributable. Similarly, even if you use the hostname when fetching the WSDL, you need the fully qualified domain name, not any short name -http://s1.example.org/ and not http://s1/ -otherwise only callers in your own domain or subnet will be able to find the server. Hand-written WSDL does not exhibit this problem; the endpoint in the WSDL is the one the author typed in.
It is almost essential that you provide some way to update the URL on the clients. The simplest is some command line override option, as used in the Axis command line tools. More advanced is a dialog box for entering URLs, and more advanced yet is some automated discovery mechanism.
Axis does not provide any discovery mechanism in the JAR. There is a sibling project, jUDDI, that provides access to UDDI registries. There is also a multicast discovery jar that works with Axis in the Axis CVS tree; this is a proof-of-concept mechanism that uses XML messages but is not compatible with any existing standard. It works OK over LAN networks, but is not designed to be used over wider area.
TODO: how to set the URL in a service
The Call object can be configured before a call can be made. The org.apache.axis.client.Call is Axis's implementation of the javax.xml.rpc.Call interface. The JAX-RPC standard interface defines a setProperty() method that lets the caller set properties; there are both JAX-RPC standard properties and Axis's own properties that you can set.
All properties have a string name, a name defined in a public static final declaration in the class.
Property Name |
Description |
Type |
---|---|---|
USERNAME_PROPERTY |
User name for authentication |
String |
PASSWORD_PROPERTY |
Password for authentication |
String |
SESSION_PROPERTY |
Participate in a session with the endpoint? |
Boolean |
OPERATION_STYLE_PROPERTY |
Type of operation |
String "rpc" or "document" |
SOAPACTION_USE_PROPERTY |
Should SOAPAction be used? |
Boolean |
SOAPACTION_URI_PROPERTY |
If SOAPAction is used, this is that action |
String |
ENCODING_STYLE_PROPERTY |
How to encode data; |
String. "http://schemas.xmlsoap.org/soap/encoding/" |
Property Name |
Description |
Type |
---|---|---|
SEND_TYPE_ATTR |
Should we send the XSI type attributes |
Boolean |
CONNECTION_TIMEOUT_PROPERTY |
Timeout used by transport sender in milliseconds |
Integer |
TRANSPORT_NAME |
Name of transport handler to use |
String |
ATTACHMENT_ENCAPSULATION_FORMAT |
Send attachments as MIME (the default), or DIME. |
String |
You can still set these Axis-specific properties in a portable client -other JAX-RPC implementations will not act on the options, of course.
Axis runs in a JVM, and JVM parameters control the client's behaviour. Here are JVM configuration options that are used.
Name |
Meaning |
Example |
---|---|---|
http.proxyHost |
Hostname of proxy server |
web-proxy |
http.proxyPort |
Port on server of proxy |
8080 |
http.proxyUser |
Optional username for proxy authentication |
someone |
http.proxyPassword |
Optional proxy server password |
secret |
networkaddress.cache.ttl |
Seconds to cache resolved hostnames; |
120 |
networkaddress.cache.negative.ttl |
Seconds to cache unresolved hostnames; |
30 |
Most of these options control proxy server settings; if they are missing and a proxy server is needed for internet access -the client will get connectivity errors of some form or another. If they are present and wrong, the client will also get connectivity errors. Note that even though users can configure the proxy settings for the JVM when hosting Applets, these settings do not propagate to applications -this is one of the many mysteries of Java networking.
Every Web Service client application needs to provide some way to configure the proxy server settings. It is also useful to display these somewhere for support call diagnostics.
The final two properties are troublesome, all the more so because they are so little known about. To find out about them, look at "Address Caching" under java.net.InetAddress. Or just switch your DNS server off for a few minutes and observe how client applications not only fail to connect when the server is missing -servers stay unreachable when the DNS server is turned back on.
What is happening is that the runtime caches the IP addresses of hostnames it resolves using a DNS query. By default, these are cached forever, so that a long running Java application will break if the IP address of the remote server ever changed during the life of the client. Similarly, the runtime caches those hostnames that do not resolve to addresses. On Java1.3, these failed lookups are cached indefinitely -if DNS is down or a laptop off the net, the client will never be able to find them again.
Clearly it is essential for any non-trivial application to set the cache options to give a sensible lifetime for cached hostnames. These values are (in Java1.4) Java Security Properties; you set them using java.security.Security.setProperty(). In Java1.3 and earlier there was some alternate mechanism that mandated properties that could only be done on the command line. We would tell you what the properties are, except we have forgotten.
The classic definition of a distributed system:
“You know you have one when the crash of a computer you've never heard of stops you from getting any work done.” Leslie Lamport
This may seem funny, but it is a depressingly accurate model of the state of distributed systems. Everyone knows that web sites are sometimes off-line, pages sometimes get served up incomplete or with some error trace instead of the results.
Web Services are similar, except instead of a human reading a web-browser-displayed error page, the client software receives the error and has to handle it or report it.
When the Axis client code receives an error, it throws an exception, specifically a subclass of java.rmi.RemoteException. This can be an AxisFault, or it can be something else. Either way, it means trouble. Usually the fault string of the exception provides some error text which is meaningful to the experienced application developer -though less meaningful to either the end user or the support team.
Here is a list of network-related error responses that can be received by a client application. As Axis' adminclient application is a SOAP client, it can see these responses too. The Sitefinder comments are specific only if VeriSign SiteFinder or a successor is subverting the normal behaviour of DNS for their own goals, an action which complicates the normal failure modes of web services.
Connection refused |
The host exists, nothing is listening for connections on that port. Alternatively, a firewall is blocking that port. |
Unknown host |
The hostname component of the URL is invalid, or the client is off-line. |
404: Not Found |
There is a web server there, but nothing at the exact URL. Proxy servers can also generate 404 pages for unknown hosts. |
302: Moved |
The content at the end of the URL has moved, and the client application does not follow the links. Site Finder: the .com or .net address is invalid, the port is explicitly -or defaulting to- port 80 |
Other 3xx response |
The content at the end of the URL has moved, and the client application does not follow the links. |
Wrong content type/MIME type |
The URL may be incorrect, or the server application is not returning XML. |
XML parser error |
This can be caused when the content is not XML, but the client application assumes it is. |
500: Internal Error |
SOAP uses this as a cue that a SOAPFault has been returned, but it can also mean 'the server is not working through some internal fault' |
Connection Timed out/ NoRouteToHost |
The hostname can be resolved, but not reached. Either the host is missing (potentially a transient fault), or network/firewall issues are preventing access. The client may need to be configured for its proxy server. This can also crop up if the caller is completely off-line. |
GUI hangs/ long pauses |
Client application may be timing out on lookups/connects |
The support line's initial response to such messages should all be the same:
When a connectivity problem is suspected, get the URL that is at fault; the caller to view it in their web browser and see if you can view it yourself.
This is where you can take advantage of the fact that Web Service protocols -REST, XML-RPC and SOAP, are all built on top of HTTP, and use the common underlying notion of URLs defining services. Provided those same URLs generate some human-readable content -even if that is an XML message- then the end user and support contact can both bring it up in their web browser. This action is the core technique for diagnosing connectivity problems, primarily because the HTTP infrastructure -servers, proxies and clients- is designed to support this diagnosis process.
Web Service providers can simplify the process by:
Another useful technique is for the service to implement the "Ping" design pattern. The service needs to support a simple "ping" operation, that immediately returns. This operation can be used by clients to probe for the presence of the service, without any other side effects or even placing much load on the server. Client applications should initiate communications with a server -uploads, complex requests, etc- by pinging it first. This detects failure early on, hopefully at a lower cost.
Networks are fundamentally unreliable; laptops move around and go offline, services get switched off.
Your application need to handle the connectivity problems and fail in a way that allows the problem to be diagnosed and corrected. Axis does not do this by itself, you need to help it.