Jeffery C.The implementation of Icon and Unicon.2004
.pdf
345
Installing, Configuring, and Compiling the Source Code
Building Unicon for Windows Version 11.0 requires Mingw32 GCC 2.95.2. Newer versions of Windows GCC might be made to work, but thusfar have produced non working executables. We hope to add Cygwin GCC support in the future. The sources may also build with modest revision under MS Visual C++ 2.0 or newer. I have built earlier versions with MSVC versions 2, 5, and 6. I encourage you to try building using other compilers, and send me your configuration files. You will need a robust Win32 platform to compile these sources; the build scripts and "make" process tend to fail on older versions of Windows.
1. Unpack the sources.
Unpack uni.zip in such a way that it preserves its subdirectory structure. Unzip.exe is recommended rather than WinZip. See Icon Project Document 243 [ipd243] for a picture of the directory hierarchy. In particular, there should be a BIN directory along with the SRC directory under the unicon/ directory.
2. Configure the sources.
Run "make WConfigureGCC" (or "make WConfigure" under MSVC) to configure your sources to build wiconx and wicont, the Unicon virtual machine interpreter, and the Unicon bytecode compiler, with graphics facilities enabled.
3. Compile to make executables.
Run "make Unicon" to build the currentlyconfigured binary set. It is worth discussing why I provide makefiles instead of a project file for use in the Visual C++ IDE. The reason is that the source files for the Unicon virtual machine interpreter (generically called iconx; wiconx.exe in this case) are written in an extended dialect of ANSI C called RTL [ipd261]. Files in this language have the extension .r instead of .c and .ri instead of . h. During compilation, a program called rtt (the run time translator) translates .r* files into .c files. If someone wants to show me how to insert this step into the Visual C++ IDE build process, I would be happy to use their IDE. You can write project files for the other C programs that make up the Unicon system, but most modifications to the language are changes to the interpreter.
Notes on the MS Windows internal functions
The functions documented here are those most likely to be involved in projects to add features to Windows Unicon.
handle_child(w, UINT msg, WPARAM wp, LPARAM lp)
This procedure handles messages from child window controls such as buttons. In many cases, this enqueues an event on the Unicon window.
int playmedia(w, char *s)
This crude function will call one of several multimedia functions depending on whether s is the name of a multimedia file (.wav, .mid, .rmi are supported) or an MCI command string.
int getselection(w, char *s)
346
Return the current contents of the clipboard text. The design of this and setselection() need to be broadened a bit to support images.
int setselection(w, char *s)
Set the clipboard text to s.
347
Chapter 28: Networking, Messaging and the
POSIX Interface
Unicon's system interface is greatly enriched compared with Icon, primarily in that it treats Internet connections and Internetbased applications as ubiquitous, and extends the file type with appropriately highlevel capabilities. Fundamental TCP and UDP connections are a breeze using the networking facilities, and common applicationlevel protocols are supported via the messaging facilities (see also the X11 graphics facilities and the SQL/ODBC database facilities for examples where applicationlevel networking is provided in Unicon). Portions of this chapter related to the messaging facilities were contributed by their author, Steve Lumos.
28.1 Networking Facilities
...
28.2 Messaging Facilities
The Transfer Protocol Library
All of the message facilities are handled by the transfer protocol library (libtp). This library provides an abstraction of the many different protocols (HTTP, SMTP, etc) into a clear and consistent API. Ease of adding support for new protocols and porting the entire library to new operating system interfaces were primary design goals. These goals are both accomplished by using the AT&T Labs discipline and method (DM) architecture described below.
Libtp Architecture
The key feature of the DM architecture is that it makes explicit two interfaces in the library: disciplines which hold system resources and define routines to acquire and manipulate them, and methods which define the higherlevel algorithms used to access these resources. This model fits the problem of Internet transfer protocols nicely; the discipline abstracts the operating system interface to the network, and there is a method for each protocol that defines communication with a server only in terms of the discipline.
This architecture makes porting easy because you need only create a discipline for the new system, which means writing 9 functions. The only currentlyexisting discipline handling both the Berkeley Socket and WINSOCK APIs is only 400 lines long. Once a discipline exists, the new system immediately gains all of the supported protocols.
The Discipline
The discipline is a C structure whose members are pointers to functions:
typedef struct _tp_disc_s |
Tpdisc_t; /* discipline */ |
|
typedef |
int |
(*Tpconnect_f)(char* host, u_short |
port, Tpdisc_t* disc); |
|
|
typedef |
int |
(*Tpclose_f)(Tpdisc_t* disc); |
348 |
|
typedef ssize_t |
(*Tpread_f)(void* buf, size_t n, Tpdisc_t* |
disc); |
|
typedef ssize_t |
(*Tpreadln_f)(void* buf, size_t n, Tpdisc_t* |
disc); |
|
typedef ssize_t |
(*Tpwrite_f)(void* buf, size_t n, Tpdisc_t* |
disc); |
|
typedef void* |
(*Tpmem_f)(size_t n, Tpdisc_t* disc); |
typedef int |
(*Tpfree_f)(void* obj, Tpdisc_t* |
disc); |
|
typedef int |
(*Tpexcept_f)(int type, void* obj, |
Tpdisc_t* disc);
typedef Tpdisc_t* (*Tpnewdisc_f)(Tpdisc_t* disc);
struct _tpdisc_s |
|
|
|
{ |
|
|
|
Tpconnect_f |
connectf; |
/* establish a connection */ |
|
Tpclose_f |
closef; |
/* close the connection */ |
|
Tpread_f |
readf; |
/* read from the connection */ |
|
Tpreadln_f readlnf; |
/* read a line from the connection */ |
||
Tpwrite_f |
writef; |
/* write to the connection */ |
|
Tpmem_f |
|
memf; /* allocate some memory */ |
|
Tpfree_f |
freef; |
/* free memory */ |
|
Tpexcept_f exceptf; |
/* handle exception */ |
||
Tpnewdisc_f |
newdiscf; |
/* deep copy a discipline */ |
|
int |
|
type; /* (not used currently) */ |
|
}; |
|
|
|
These functions define a complete API for acquiring and manipulating all of the system resources needed by all of the methods and (it is hoped) any conceivable method. By convention, every discipline function takes a pointer to the current discipline as its last argument. (Every method function takes a library handle which contains a pointer to the current discipline, so the discipline functions are always available when needed.) The Tpdisc_t is an abstract discipline. In practice, a new discipline will extend Tpdisc_t by at minimum adding some system dependent data such as a Unix file descriptor or Windows SOCKET*. Here is the "Unix" discipline (it would be better called the socket discipline since it works for the Berkeley Socket API and WINSOCK on multiple systems):
struct _tpunixdisc_s
{
Tpdisc_t tpdisc; int fd;
}
Exception Handling
The DM archtecture defines a very useful convention for exception handling. Exceptions are passed as integers to the exceptf function along with some exceptionspecific data. The function can do arbitrary processesing and then return {1, 0, 1}, which instructs the library to retry the operation (1), return an error to the caller (1), or take some default action (0). Libtp uses constants TP_TRYAGAIN, TP_RETURNERROR, and TP_DEFAULT.
Although not as powerful as languages with true exceptions, the DM exception handling definitely serves to make the code more readable. In the Unix discipline, exceptf is used
349
to aggregate all of the many, sometimes transient errors that can occur in network programming. For example, the Unix discipline's readf function is:
ssize_t unixread(void* buf, size_t n, Tpdisc_t* tpdisc)
{
Tpunixdisc_t* disc = (Tpunixdisc_t*)tpdisc;
size_t |
nleft; |
ssize_t |
nread; |
char* |
ptr = buf; |
nleft = |
n; |
while (nleft > 0) {
if ((nread = read(disc->fd, ptr, nleft)) <= 0) {
int action = tpdisc->exceptf(TP_EREAD, &nread, tpdisc); if (action > 0) {
nread = 0; continue;
}
else if (action == 0) { break;
}
else {
return (-1);
}
}
nleft -= nread; ptr += nread;
}
return (n – nleft);
}
The Unix read() system call can return a positive number, indicating the number of bytes read, a negative number, indicated error, or zero, if endoffile is reached (or a network connection is closed by the remote host). We consider the latter two cases exceptional, and ask exceptf what we should do. An exceptf function is normally a large switch with one case for each exception. For TP_EREAD, it says:
case TP_EREAD:
if (errno == EINTR) { return TP_TRYAGAIN;
}
else {
ssize_t nread = (*(ssize_t*)obj); if (nread == 0) { /* EOF */
return TP_DEFAULT;
}
else {
return TP_RETURNERROR;
}
}
This may not seem very revolutionary, after all the code that calls exceptf and branches on its result is just as long as the exception handler itself. We aren't even gaining much codereuse over the conventional method, which wraps system calls in another function with names like Read(). The real win here lies in the ability of the caller to replace or
350
extend exceptf at runtime. You may have noticed that there is no code above to output an error message, unixread() simply returns 1 on errors. In fact, the standard and expected way to output errors is to override exceptf. The wtrace example shown [XXX: at the end somewhere?] uses the following:
Tpexcept_f tpexcept;
Tpdisc_t disc;
int exception(int e, void* obj, Tpdist_t* disc)
{
int rc = tpexcept(e, obj, disc); if (rc == TP_RETURNERROR) {
if (errno != 0) { perror(url);
}
else {
switch (e) { case TP_HOST:
fputs(url, stderr);
fputs(": Unknown host\n", stderr); break;
default:
fputs(url, stderr);
fputs(": Error connecting\n", stderr);
}
}
exit(1);
}
else { return rc;
}
}
Then instead of the usual:
tp = tp_new(<uri>, <method>, TpdUnix);
wtrace copies TpdUnix, saves and replaces the default exception handler, and then uses the copied discipline:
disc = tp_newdisc(TpdUnix); tpexcept = disc->exceptf; disc->exceptf = exception;
tp = tp_new(<uri>, <method>, disc);
In the same way, wtrace also overrides all of the read and write functions to provide a trace log of HTTP communications.
351
Part IV: Appendixes
352
353
Appendix A: Data Structures
This appendix summarizes, for reference purposes, all descriptor and block lay outs in Icon.
A.1 Descriptors
Descriptors consist of two words (normally C ints): a dword and a vword. The dword contains flags in its most significant bits and small integers in its least significant bits. The vword contains a value or a pointer. The flags are
nnonqualifier
pvword contains a pointer
vvariable
ttrapped variable
A.1.1 Values
There are three significantly different descriptor layouts for values. A qualifier for a string is distinguished from other descriptors by the lack of an n flag in its dword, which contains only the length of the string. For example, a qualifier for the string "hello" is
The null value and integers have type codes in their dwords and are selfcontained. Examples are:
For all other data types, a descriptor contains a type code in its dword and a pointer to a block of data in its vword. A record is typical:
354
A.1.2 Variables
There are two formats for variable descriptors. The vword of an ordinary variable points to the descriptor for the corresponding value:
If the variable points to a descriptor in a block, the offset is the number of words from the top of the block to the value descriptor. If the variable points to a descriptor that corresponds to an identifier, the offset is zero.
The descriptor for a trapped variable contain,s a type code for the kind of trapped variable in its dword and a pointer to the block for the trapped variable in its vword. The trapped variable for &subject is typical:
A.2 Blocks
With the exception of the null value, integers, and strings, the data for Icon values is kept in blocks. The first word of every block is a title that contains the type code for the corresponding data type. For blocks that vary in size for a particular type, the next word is the size of the block in bytes. The remaining words depend on the block type, except that all nondescriptor data precedes all descriptor data. With the exception of the long integer block, the diagrams that follow correspond to blocks for computers with 32bit words.
A.2.1 Long Integers
On computers with 16bit words, integers that are too large to fit in the dword of a descriptor are stored in blocks. For example, the block for the integer 80,000 is
A.2.2 Real Numbers
Real numbers are represented by C doubles. For example, on computers with 32 bit words, the real number 1.0 is represented by
A.2.3 Csets
The block for a cset contains the usual type code, followed by a word that contains the number of characters in the cset. Words totaling 256 bits follow, with a one in a bit position indicating that the corresponding character is in the cset, and a zero indicating that it is not. For example, &ascii is
