What is a Port?

rgchris · May 14, 2018, 3:17am

Basic Concepts

As stated above, a port is used to transfer data. However, the basic port definition is a bit more general than that. A port is actually more like a stream of data that undergoes some type of exchange, transformation, or effect.

For example, a port is often used for I/O functions such as:

console input and output
file reading and writing
directories of files
network transferring of data
event handling, such as mouse clicks or keyboard input
database access

But, a port can also be used for other types of functions:

image conversion - such as encoding or decoding a JPEG file.
sound conversion - such as encoding or decoding an audio file
checksum computation - keeping a running checksum
compression and decompression of data
encryption and decryption of data
other codecs for encoding and decoding data formats

Related to Series

As you know, Rebol is built on the concept of a Series.

A series is a set of values arranged in a specific order. It is a sequence.

A port is a special type of series. Not only is it a sequence, but it can also hold state information like an object, and access external devices for I/O or other high-speed operations, such as image conversion or encryption.

In Rebol 2, ports were built on a pure series model. However, we found this approach to be problematic because ports are not pure series. They also embody state (information).

For example, a file can be thought of as a stream of bytes. But, a file also has other important attributes such a file name, a location within a directory, creation and modification dates, permissions like read-only or allow execution, and ownership information. These attributes fall outside of a pure series model.

New Definition

Rebol 3 moves away from the pure series model of Rebol 2 and more toward an I/O stream model. Now it is closer to the concept found in other programming environments and languages.

So, a port can be defined as:

a series of values - such as a sequence of bytes
holds state information - such as file attributes
can access the external world - network communication, for example
can have side effects - internal changes, such as compression

The pure series model is gone. Ports are more pragmatic now, and this has resulted in a port system that is cleaner, smaller, faster, and more extensible than ever before.

Main Components

A port consists of these main ideas:

A name that specifies the general type of port (scheme)
An object that holds information (state of port)
A set of functions that are applied to that object (actions)

The name of a port is called a 'scheme'. Example schemes are:

console
file
dir - file directory
event - gui events (mainly)
TCP - networking
HTTP - web connections
clipboard - cut and paste
sound - for audio output
system - system state changes

Many other types of schemes can exist, and they are often built on top of lower level schemes. For example, FTP for file transfer is built on the TCP networking scheme.

Here is an example. In this line:

port: open tcp://www.rebol.net
data: read http://www.rebol.com

the first scheme is TCP; the second is HTTP. (Note that this is consistent with the definition of a URL.)

The object holds information such as:

the type of the port (file, network, database, etc.)
the name and location (path) of a file
the URI of a network connection
a network host name and port number
a buffer of data being transferred
date and time info
structures used by external devices

This object is of a specific Rebol datatype, called a PORT!

Specific action functions can be applied to a port. Some common actions are:

make - create a new port
open - initialize the port
close - finalize the port
read - read data from port
write - write data to port
query - get other information from port
update - detect external changes to the port

But, there are many other actions as well, as generally defined by Rebol datatypes.

Using Ports

Two Basic Methods

There are two basic methods to use a port: implicit and explicit.

When you write code such as:

write %index.html read http://www.rebol.net

you are using implicit ports. This is a shortcut notation to keep simple code simple. You are only using a single port action, such as read or write and all the other details are hidden behind those functions.

However, if you write code such as:

file: open %data.dat
write file data1
write file data2
...
close file

then you are using explicit ports. Here you specify each action separately. You open the port, then read and write to the port, and then close the port. Each action must be specified.

Fast and Easy

Implicit ports are the fast and easy way to perform various I/O actions in Rebol.

A few examples are:

data: read %todo.dat
write %plans.r data
query %docs.txt
page: read http://www.rebol.net
result: write http://rebol.net/cgi/act.r data
data: read ftp://www.rebol.net/projects.dat
host: read dns://www.rebol.net

This type of usage depends on the type of port (the scheme). The example above uses the file, http, ftp, and dns schemes. Those schemes have been designed to support implicit actions.

Notice that for local files, the file datatype is used to indicate usage of the file scheme. The line:

data: read file://todo.dat

is equally valid. Think of the file datatype as an abbreviation for that. Both methods use the same file scheme to perform the I/O.

Other schemes do not support implicit usage. For example:

>> data: read tcp://www.rebol.com
** Access error: Port is not open: tcp://www.rebol.com
** Where: read
** Near: read tcp://www.rebol.com

This error occurs because TCP does not support an implicit read action. That's because TCP is a lower level scheme that requires a higher level protocol in order to be useful.

Full Control

Explicit ports give you full control over each I/O action.
For example, let's say you want to read a large file in small 20000 byte chunks. You might use these steps:

file: open %bigdata.dat
while [not zero? data: read/part file 20000] [
    process data
]
close file

This common method will be familiar to most programmers. The file is opened, reads are done, and the file is closed. Each action is done separately.

This type of explicit I/O is common for large files that would consume a lot of memory if you read them with implicit I/O. For example, if the bigdata.dat file is 10 GB, you would not be able to read it all into memory at one time.

Explicit I/O is also used when you need strict control over each action. This is often done if you need to seek to different locations within a file or write your own network protocol.

For example, let's say you need to read data from three different parts of a large file. In that case you would use read to seek to each part of the file to do the read:

file: open %bigdata.dat
da-head: read/part file 4000
da-body: read/seek/part file 12000 10000
da-tail: read/seek/part file 56000 4000
close file

Port Details

This section describes some of the important concepts you need to know about ports.

Port Datatype

A port is a Rebol datatype. If you use explicit ports, you will need to use the port datatype as a type of handle to access the port. If you've used handles before in other languages, that concept is probably familiar to you already.

In Rebol a port is very similar to an object because it stores information in named fields. We often call these fields the state of the port. When various actions are performed, the state will change, depending on the action. A port differs from an object in that it responds in a special way to specific datatype actions such as open, read, write, and several others.

Port Schemes

A scheme is a type of port.

You will use schemes to identify the type of port access you need as well as the protocol to use.

For example, when you access a local file, you are using the file scheme. When you read a web page, you use the http scheme, which is a higher level protocol built on top of the tcp scheme.

Each scheme has a unique name that is used to identify it. For example, file, http, and tcp are the scheme names shown above. A scheme name can be used as part of a URL, or separately, depending on requirements.

The Rebol system manages a list of available schemes. These schemes can be built-in, can be loaded separately, or can even be user defined within a script.

A lot more about schemes can be found in the Port Implementation section.

Making Ports

All ports are made from a spec -- a specification of the port's attributes. As you have seen above, the spec can be something quite simple, such as a file name or URL. But, a port spec can also be a block that includes many fields to indicate various options for the port.

All of these can be used as port specs:

%file.txt  ; a file name
tcp://www.rebol.com ; a URL
[scheme: 'tcp host: "www.rebol.net"] ; a block
'tcp  ; just the port's scheme name
object ; an object that specifies the port
port ; a existing port

There are a couple ways to make a port, depending on your required level of control.

One method is to use the make action, as you would for any datatype. The general form is:

port: make port! spec

Where port! is the port datatype itself, and spec is the specification as described above.

Here are some examples:

port1: make port! %file.txt
port2: make port! tcp://www.rebol.net
port3: make port! [scheme: 'tcp host: "www.rebol.net"]

These examples will create a port object and initialize its various fields.

One of the most common methods to create a port is with the open function. Unlike make the open function does not require a port! datatype. It knows that it is being provided with a spec. For example:

port: open tcp://www.rebol.net

will create a new port and also perform initializations associated with the open action.

More details about open are discussed later.

Port Actions

Port actions can be thought of as functions that act on ports.

More precisely, port actions are polymorphic datatype actions similar to those used on all other datatypes. If you're not sure what that means, don't worry about it here. Just think of ports like objects that have a well-defined set of methods that act on them.

The actions defined for ports are:

make: make a new port object
to: special (convert an object to a port)
open: initialize external operations
close: conclude external operations
write: transfer data to the port
read: transfer data from the port
query: get information about the port
update: update the port's state
create: create an external object of port type
delete: delete an external object of port type
rename: rename an external object of port type

rgchris · May 14, 2018, 3:27am

This doesn't quite parse, to me it should read: The name of a port corresponds to the name of the scheme on which it is based.

rgchris · May 14, 2018, 3:32am

One slight complication here is where READ/WRITE is handled for both implicit and explicit modes of a scheme:

read %a-file.txt

port: open %a-file.txt
read port
close port

As I understand it, the scheme author is responsible for monitoring whether an instance is implicit or explicit. Related: OPEN?

From a user point of view, it's possible that the simplest way to understand the difference between implicit from explicit is whether one passes a FILE! or URL! value to READ vs. passing a PORT! value.

rgchris · May 14, 2018, 3:37am

I think this is wrong: a scheme is the prototype for a port. All ports inherit the actions/properties of their parent scheme.

rgchris · May 14, 2018, 3:54am

I don't ever recall seeing examples of how these are supposed to work.

The following is a spitball projection of, say, compression:

big-file: open %a-big-file.bin

compressor: open [scheme: zip target: %a-big-file.zip]
insert compressor big-file
close compressor

Oldes · May 14, 2018, 4:54pm

Instead of:

err: try [
    file: open %bigdata.dat
    da-head: read/part file 4000
    da-body: read/seek/part file 12000 10000
    da-tail: read/seek/part file 56000 4000
]
close file

if error? err [
    print ["Port error:" form err]
]

there should be:

err: try [
    file: open %bigdata.dat
    da-head: read/part file 4000
    da-body: read/seek/part file 12000 10000
    da-tail: read/seek/part file 56000 4000
    close file
]

if error? err [
    print ["Port error:" form err]
]

Because is something is going to fail in the try code, than it is the file opening. If it fails, the file would be none and one would receive uncatched error, because close does not handle none value.

Or even better:

err: try [
    file: open %bigdata.dat
    da-head: read/part file 4000
    da-body: read/seek/part file 12000 10000
    da-tail: read/seek/part file 56000 4000
]
if file [close file]

if error? err [
    print ["Port error:" form err]
]

Which will close the file in case, when fails the reading in the try block above.

hostilefork · May 15, 2018, 8:00am

So, a port can be defined as:

a series of values - such as a sequence of bytes

This "such as" bothers me, because there's a fair amount of magic assumed here. How do I know if it's a sequence of bytes, or a pizza?

This puts a lot of pressure on READ and WRITE:

data: read %todo.dat
write %plans.r data

What does this mean? So it knows from the .DAT extension what to do (a table that tells it how to decode things that end in the .DAT extension?) And it gets Rebol-compatible records it can write out to %plans.r?

Let's look at the refinements on READ and WRITE in R3-Alpha:

READ source /part length /seek index /string /lines
WRITE destination data /part length /seek index /append /allow access /lines

Given the way R3-Alpha's not-very-fancy Multiple Dispatch model worked, those are the only refinements you will ever be able to pass to READ and WRITE. And there was no guarantee a port would pay attention to them. Try read/lines http://example.com, for instance. You get back an unprocessed BINARY!.

There are two basic methods to use a port: implicit and explicit.

If I feel there's anything to the Rebol I/O model, it is mostly centering around being able to write one kind of PORT! object for the explicit behavior, and then get--somewhat "for free"--the implicit.

So that seems to be the thing to focus on...defining it, defining its limits, and showing realistic scenarios of what it might be used to accomplish--in a way that adds benefit over just making a bunch of disparate functions which can have their own pertinent refinements, like READ-CLIPBOARD, READ-HTTP, etc.

rgchris · May 15, 2018, 3:05pm

This is scheme-dependent. The only place where the FILE scheme would vary on READ (processes a stream of bytes) is if a file is a directory (returns block of contained files) or non-existent (returns error).

READ is just a conduit for the READ actor within the scheme.

That overlooks the efficiency that can be gained by the explicit model. Take HTTP as an example: you can process multiple requests with one port (and thus a single persistent TCP connection).

rebol-site: port: open http://www.rebol.com/
result: read port

result2: write port [get %file1.html]
result3: write port [get %file2.html]

result4: write port [post %target-file [Header: "Value"] {Request contents}]
close port

hostilefork · May 15, 2018, 3:08pm

I'm not overlooking any efficiency. I'm just saying that if I am a port author, then the concept being that I could theoretically just implement the port as the explicit version. And then the shorthand would be available for users, because that is a feature of the port model which I get by following the rules of implementing ports.

And thus far, it's the only "feature" that I see involved. Everything else is something I could do more conveniently and clearly by making an OBJECT! with methods called "read" and "write" and "open"...or whatever methods I wanted.

rgchris · May 15, 2018, 3:12pm

My bad, I see what you mean.

Perhaps this is the case (effectively this is what you're doing anyway), however it does offer some consistency and access to native verbs.

for-each resource [
    %a-file.dat
    http://some.place/foo
    ftp:///some.place/bar.r
][
    probe read resource
]

It also gives you a framework for building in related state, metadata and documentation. Also auto-breakdown of URLs.

It also offers a best-practice model for implementing such things. There are a few examples (from memory) in Rebol 2 where someone has gone the alternate verb route or the object route, what you end up with is an interface that is less than easy or intuitive to use or maintain.

gchiu · May 15, 2018, 9:06pm

Er, isn't that because someone removed the lines processing in the scheme?

hostilefork · May 15, 2018, 9:12pm

I guess it's up to you to find a place where it ever was for R3-Alpha. As far as I can tell it was never implemented.

It's difficult to implement in a generic way. In Ren-C, just because someone asked about it (I think), I added a bit of a hack in READ that if it sees you get a /LINES refinement and have a port produce a BINARY!, it converts it to text and then turns it into lines. Or if it's a TEXT! then it will break it into lines. But this is relatively inefficient when compared with the idea of a port that did the conversion to lines as it went.

My point is just about the very complex set of concerns. What qualifies /LINES as a refinement in the finite universe of "the only choices for what read" has? And if it was qualified, how was it justified that it was skipped over.