About publishing files to a third-party server
send sending
You can publish a document to another server using the 'send
elsewhere' tab in the form.
Publication to another web site (that is, when its URL starts
http:// or https:// - the other possibility is ftp://) simulates
filling in a form on the other web site which includes a file
upload. Technically speaking, this is a multipart-encoded HTTP POST,
which a web browser would do when presented with
<form enctype="multipart/form-data" method="post" ... >
Ensembling acts as a robot to
fill in the form (do the POST operation) whenever a document is
published in this way or when a new version of a published document is
added (a version is superseded). The form may already exist out there
on the internet, or the site may not provide a form at all but simply
the script to receive a file and any other fields. (When used in this
way, this is known as a RESTful API.) An
example PHP server script is
available to illustrate this. Similar scripts for Perl, Python, Ruby or ASP
etc. are straightforward to write.
Either way, some technical knowledge is required. However, once a
document has been published in this way by someone who understands
what is needed, any other document can be published to the same site
just by choosing the name the first document was published with.
Publishing to an existing form
To publish to an existing form requires knowledge of how the form
is structured. In particular the names by which of the input fields
are identified is needed. This can usually be determined by inspecting
the HTML which describes the form.
The primary field is the file, which if a user were filling in the
form manually would show as a button which opens a box to select a
file and is represented in HTML by (for example):
<input type="file" name="file" />
The form handler may also require other fields to be completed:
for example the name of the person submiting the form. When automated
by Ensembling the file field is, of course, filled in as the most
recent version of the published document. Other fields can be filled
in with text you provide or with certain information about the
document (for example the name of the person who added it).
Some web forms require a site log in to access them. These
credentials can be provided as part of the publication
information. However it is not possible to do this with forms that
require a log in on a separate page. In technical terms, we support
Basic Auth and Digest Auth, but not session based log ins. Another
form that would be problematic is where a field is required whose
value is provided by the web page which displays the form: this is
sometimes done with a hidden field or a "captcha" or "nonce" which changes every
time, specifically to make it hard for robots to fill in the form.
Publishing to a custom script
Where the server is under your control, things are simpler. A
small script modelled on the
PHP example provided (link opens new window or tab)
can receive a file and put it where required.
When this is set up specifically to communicate with Ensembling,
there are some checks that can be made to avoid intruders uploading
arbitrary files:
- You can protect the form with a log in using
Basic Auth
or Digest Auth
(Apache
configuration details).
- You can easily check that the source of the file is
Ensembling's IP address.
- You can provide Ensembling with a 'shared secret'. Ensembling uses
this as a key to encrypt a random
number. The random number in clear text and its encryption are both provided, so the
recipient can encrypt the plain number using the secret key and
compare it with the encrypted text to validate the transaction.
Sending field values to the target form
If no fields are specified explicitly then the file is provided in
a field called 'file' and the security information described above in
'security'. As these are what is expected by the
example script,
setting up publication to a site using that script needs only the URL
of the site, any login credentials if the form is protected by a log
in, and the secret number (if using one).
To provide additional fields, name them in the boxes provided, choose
what their value should be, and in the case of verbatim text what that
text should be. Each time you add a field a new set of boxes is added
for another.
In the receiving script, the form fields are distributed into
variables and arrays appropriate to the scripting language being used
which the receiving script can then read. In PHP, file fields are
separated out from the other fields when received on the target
server. For example, in PHP the file field will be in the $_FILES
array while other fields will be in $_POST. In Python, it is in the
array returned by cgi.FieldStorage() indexed by the field name.
Checkboxes are generally only supplied by the user agent if
checked, so if you want to simulate a turned off check box, just don't
include it in the Ensembling form and to turn one on use a text field
with the value 'on'. There is usually no need to include the field for
any Submit button, unless there are several ways to submit a form
which would do different things.
Asynchronous transfer
Because file uploads are potentially slow, we don't wait for an
upload to complete. Project leaders will be sent an email when
something detectably wrong happens. A custom script can send
appropriate error codes. However most web forms that fail do so simply
by showing a message to the user, so as we cannot automate reading a
form, we cannot detect when that happens.