The SQL+PaWS Specification
Date: 2007-10-01T12:01:00 +10:00
By: gosh'at'DigitalFriend.org (Steve Goschnick), for Solid Software Pty Ltd
Any page returned by a web service that doesn't conform to this specification, cannot claim to be an SQL+PaWS web service.
This document aims to be a minimalist while usable specification of SQL+PaWS, to enable people to author web services that conform to an SQL+PaWS web service.
News: 2007-10-22T02:04:00 +10:00
Version 1.0 of this SQL+PaWS Specification was released, along with the JSP (Java Server Pages) version of the software, as Open Source software under the GPL License.
Overview:
SQL+PaWS - a simple technology that allows SQL databases and people to become Web Services. The data delivery mechanism is a simple two dimensional uniform <table> structure in HTML, used to represent sets or a vector of data. SQL+PaWS has two levels of use, both of them are easy and flexible: SQL+PaWS is readable by people and is also machine-readable by client-side programs that consume web services. SQL+PaWS is writable either by people with web-page editors following a simple protocol, or via a small piece of generic code that retrieves data from SQL-oriented database management systems - all that changes on the server-side from web service to web service, is: database-name, user-name, password. The users who wish to use SQL sourced data feeds can prototype and hone their SELECT statements via a single text-area field in a HTML form - almost all web-servers have an SQL DBMS on them of some sort, giving broad availability. Individuals who author web-pages with everyday HTML editors such as DreamWeaver, can become providers of web-services, which are in turn read by people or by software (see figure 1 below). In addition to the ubiquitous web browser, the client-side software may be simple consumers of just the data such as a spreadsheet program, or it may be used to put together mashup applications, or even something as sophisticated as a, a software agent system, a DSS (Decision Support System) or other data warehousing products such as a part of a data warehouse data extraction phase.
Figure 1: Overview of SQL and People as Web Services (SQL+PaWS)
Example SQL+PaWS Web Service Request:
An SQL+PaWS web service is accessed via a URL. For example, there is a Country-Code plus Currency-Code web service available at:
http://idealab.dis.unimelb.edu.au:8080/sqlPaWS/countryCurrencyCodesWS.jsp
which gives access to all the country names and the 2-letter code assigned to them by the United Nations, and the name of the main currency/ies used in each country, and the currencies 3-letter code.
If using a web browser to access the web service (rather than calling the URL via an application program), the user faces a simple form requesting an SQL statement, as in figure 2 below.
Figure 2: The simple HTML input form with the SQL Select statement that produced the table in figure 1, via the example web services URL.
(Note: Even when the consumer of the web service is an application program, this web browser interface is a very useful mechanism to prototype and test the web service - both in the development of the web service, and in the refinement of a user's SQL Select statement)
Example SQL+PaWS Web Service Response:
The formal response from an SQL+PaWS web service is within a <table> ... </table> structure, usually within a HTML file (Note: There is no reason why a web service provider couldn't return the HTML-style <table> embedded within some other file format, if they need it to be so). Figure 3 below is an example response from the SQL select placed in the Request form in figure 2 above.
Figure 3 : HTML table produced with data plus, data item names, types, sizes and other metadata (Note: save this .gif file, if you want a clearer/higher resolution image)
Example HTML <table> Fragment Embedded in returned by the URL:
The HTML table carrying the data payload for SQL+PaWS is always a 2 dimensional uniform grid i.e. set-based data, including a vector (a single row). It may be imbedded any where within any sort of HTML file with any number of other tables. The start of the SQL+PaWS table is marked with an anchor tag - see line 1 in Listing1 below. (Nb. Standard HTML usually uses such anchor tags - a variant on the link tag - to allow browsers to start the display of a page, part way down the file). The opening table tag <table> immediately follows the <a name="START-SQL-PaWS"></a> relative anchor.
Listing 1 . HTML code fragment with essential insertions:
<a name="START-SQL+PaWS"></a>
<table>
<caption align="bottom">Date Created: Sep 28, 2007 1:27:12 +1000 </caption>
<thead valign="top">
<tr>
<th>CountryName</th>
<th>CountryCode</th>
<th>CurrencyCode</th>
<th>CurrencyName</th>
</tr>
<tr>
<th>VARCHAR(60)</th>
<th>CHAR(2)</th>
<th>CHAR(3)</th>
<th>VARCHAR(50)</th>
</tr> ...
</thead>
<tbody>
<tr>
<td>AMERICAN SAMOA</td>
<td>AS</td> ...
</tr> ...
</tbody>
</table>
<a name="END-SQL+PaWS"></a>
Locating Relative Tag in returned HTML page (or other file type page):
The first line in the listing above - <a name="START-SQL+PaWS"></a> - is the standard 'anchor' usage of the HTML link tag <a>. Marking the start of the table with this anchor allows the HTML table carrying the data, to be formatted in whatever way the web service author wants it to be, including the use of any number of other <table>'s in that same HTML file (Note: It also allows the payload carrying <table> to be embedded within files other than HTML files).
Note that there is also an anchor tag immediately after the end of table tag </table> of the table that holds the data payload. e.g.
<a name="END-SQL+PaWS"></a>
Metadata placed in the <THEAD> table Tag:
The metadata (metadata = 'data about data') from the DBMS storing the data, is displayed at the top of the table, within the table header tags <thead valign="top"> ... </thead>. There are then always 4 rows of metadata, each meta-attribute row stored within a <TR> ... </TR> tag pair. The first 3 rows representing the following:
Attribute-name
Standard SQL data type (max-width.places-displayed)
Has NULLS / No NULLS
+ve only / ' '( i.e. blank)
Note 1 regarding row one: The attribute name is the name of the COLUMN within the DBMS table that the SQL select command is reading from, in the order listed following the SELECT command. In the case of people-written tables, these names are of their own choosing, providing they do not include spaces, and use alphanumeric characters (alphabet plus numbers 0 through 9) and the underscore character (_) or a dash (-). E.g. Birthday-list104_A, Note that, as with SQL, names are not sensitive to lowercase/uppercase characters, so that application programs that use these attributes ought to not distinguish between upper and lower case characters, from an attribute name definition point of view. I.e. They may well display them with both upper and lowercase characters.
Note 2 regarding row two: The data type displayed is the 'standard SQL type' rather than a specific DBMS data type (e.g. MySQL or Oracle specific types), when the two vary in naming convention. After the type name (e.g. VARCHAR) the maximum number of characters that a user/application-program can expect to encounter is enclosed in curved brackets e.g. VARCHAR(60) - tells the user that even though the number of characters that may appear in the data columns further down, a single value will never exceed 60 characters in length. In the case of numbers that have floating points, such as DECIMAL, a decimal point is also used to tell the user/application-program how hany decimal places will be displayed in the data. E.g. DECIMAL(10.2) could hold and display dollar amounts up to 99999999.99 In the case of these data types that can have and display decimal places, the specification within the brackets is (max-width.places-displayed) This is also known as (precision.scale) in the formal SQL type definition of such numeric fields.
Note 3 regarding row three: There are only two values which can be displayed here, which are mutually exclusive- 'Has NULLs' or 'No NULLs' - indicating whether the attribute can have an undefined value (i.e. NULL) or not. One of the two values is mandatorially displayed in SQL+PaWS.
Note 4 regarding row four: There are only two values which can be displayed here, which are mutually exclusive- '+ve only' or ' ' - it is left blank for all data types other than UNSIGNED integers. An unsigned integer means that only positive numbers will ever be stored and retrieved from that particular data field. I.e. By discarding negative numbers, twice as many positve numbers may be stored in that same amount of bytes on a hard-disk/storage device. So, fields which never have negative integer values - e.g. 'Student_ID' would have been created by an astute database designer, as an UNSIGNED INTEGER. UNSIGNED can be applied to all of the integer data types, i.e. TINYINT, SMALLINT, INTEGER, BIGINT. In figure 3 above, this fourth row of metadat DOES exist, but it is not visually obvious as all the table cells in that whole row are BLANKs rather than the alternative '+ve only' value.
Each of the data cells in these rows holding the metadata, are wrapped in a <th> ... </th> tag pair.
Date and Time Format in the Table <Caption> Tag
The date and time when the web service returned the data (as per figure 3 above), is held in the CAPTION tag that goes with the <table> holding the data. E.g.
<caption align="bottom">Date Created: Fri, 28 Sep, 2007 1:27:12 +1000 </caption>
People can be sensitive to the Date and Time formats used, so we allow two formats, either RFC822 or ISO 8601. However time is very important when web services are involved, so SQL+PaWS insists on the time down to 'seconds' (i.e. hours, minutes and seconds) when using either format.
The acceptance of the two formats is to 'ease the burden' on the writers/creators of web services. If people are reading the returned data in a web browser, they can readily figure out which sort of date+time format it is. If a client-side application program or mashup is using the web service, then that sort of user is used to/skilled-in dealing with multiple date+time formats anyway.
RFC822:
The Date and Time format may use the RFC822 standard as per the RSS standard - for simplicity and availability. See details here: http://www.w3.org/Protocols/rfc822/#z28
e.g. Thu, 30 Aug 2007 20:12:31 +1000
ISO8601:
Is an international standard for date and time representations issued by the International Organization for Standardization (ISO). The signature feature of the ISO 8601 format for date and time is that the information is ordered from the most to the least significant or, in plain terms, from the largest (the year) to the smallest (the second) E.g. 2007-10-01T20:42:56Z
Ending in 'Z' tells us it is International date format.
For readable details on ISO8601 link to here: http://www.w3.org/TR/NOTE-datetime and http://en.wikipedia.org/wiki/ISO_8601.
In short, the date+time format goes like this:
YYYY:MM:DDThh:mm:ssZ (the T for time) or YYYY:MM:DDThh:mm:ss +hh:mm
where:
- YYYY = four-digit year
- MM = two-digit month (01=January, etc.)
- DD = two-digit day of month (01 through 31)
- hh = two digits of hour (00 through 23) (am/pm NOT allowed)
- mm = two digits of minute (00 through 59)
- ss = two digits of second (00 through 59)
- s = one or more digits representing a decimal fraction of a second
- TZD = time zone designator (Z or +hh:mm or -hh:mm) - where the + or - hours and minutes tells you that the time is in 'local time', but that this local time is that many hour:minutes ahead of (if +) or behind (if -) UTC (Coordinated Universal Time), so that the time given can easily be manipulated to get it into UTC - which is usually more useful in a distributed mashup application.
From the consumer of the data's point of view, if the data+time format ends with either a Z or a +hh:mm or -hh:mm figure, then the date+time given by the SQL+PaWS web service in question, is in international UTC date and time format, otherwise, it is assumed to be in RFC822 format.
Note: the 'align=xxx' parameter in the <caption> tag is inconsequential, as the caption tag always directly follows the <table> tag in a HTML file, regardless of where the caption is actually displayed in a web browser (e.g. top, bottom, left or right).
The SQL Data Types
The data types used in SQL+PaWS - that may be returned from a web service, within the <table> rows - include most of the standard SQL (1992) data types.
The standard SQL types included are:
CHAR
VARCHAR
LONGVARCHAR
TINYINT
SMALLINT
INTEGER
BIGINT
DECIMAL
NUMERIC
REAL
FLOAT
DOUBLE
BIT
BOOLEAN
DATE
TIME
TIMESTAMP
DATALINK
(Should return type as UNKNOWN for any other SQL data types returned by an SQL SELECT command)
Dealing with Complex Data Types
SQL data types supported within SQL+PaWS does NOT include the following five object-like types introduced in so-called SQL-3:
ARRAY
BLOB
CLOB
REF
STRUCT
primarily because these do not translate well into a <table> cell format.
Instead, to deal with complex data types (including images and videos) the method we use in SQL+PaWS is to place a URI link in an SQL field rather than in the database tables themselves, and then place the actual object somewhere nearby the URI (i.e. in the same sub-directory or further down the sub-directory tree).
Such a URI link can be placed in an SQL DATALINK field type if the DBMS being used server-side supports such a type (note: the DATALINK type refers to fields that are stored outside of the DBMS itself, at a URI). Otherwise, in the case that the DBMS does not support the DATALINK type, place the link in a VARCHAR field called 'DATALINKn' - where n is a positive integer number such as 1 or 2 or 3, etc.
Immediately preceding the field holding the URI link, place a field holding the type of object that is at the URI link, where the type value is one of the following MIME (Multipurpose Internet Mail Extensions) object/content types:
application/xml,
application/xml-dtd,
application/powerpoint,
application/postscript,
application/pdf,
application/msword,
application/rtf,
application/zip,
application/x-java-vm,
application/x-java-archive,
audio/basic,
audio/mpeg,
audio/tone,
audio/x-wav,
audio/x-aiff,
audio/x-midi,
image/cgm,
image/gif,
image/jpeg,
image/tiff,
image/x-xbitmap,
model/vrml,
message/news,
message/rfc822,
multipart/form-data,
text/calendar,
text/css,
text/html,
text/richtext,
text/directory,
text/xml,
video/MP4V-ES,
video/quicktime,
video/x-msvideo,
video/mpeg,
video/x-sgi-movie
For more details on MIME see: http://en.wikipedia.org/wiki/MIME or http://www.iana.org/assignments/media-types/
Copyright and Disclaimer:
(c) Copyright 2007 Solid Software Pty Ltd. All Rights Reserved.
This document and translations of it in other human languages, may be copied and given to others, and derivative works that explain, comment on, or otherwise assist in the implementation of SQL+PaWS web services, may be authored, copied, published and distributed, in whole or in part, without restriction, provided that the above copyright notice and these three paragraphs that follow it, are included on all such copies and derivative works.
This document may not be modified in any way, except by the copyright holder listed above. While these copyright restrictions apply to this written SQL+PaWS specification, no claim of ownership is made by Solid Software to the protocol an technique that is SQL+PaWS. This means that any party, for commercial puposes or not, may implement SQL+PaWS web services without royalty fees or license fees and the like to Solid Software. This limited permissions granted will not be revoked by Solid Software, or its successors.
This document and the information contained herein is provided on an 'AS IS' BASIS AND SOLID SOFTWARE PTY LTD DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.