| P2pImplementation |
UserPreferences |
| SEED Wiki | FrontPage | RecentChanges | TitleIndex | WordIndex | SiteNavigation | HelpContents |
Notes on the current (July 2005) implementation of the Peer to Peer exchange mechanism.
This is an attempt to capture the current architecture, with an eye toward cleaning it up.
See the P2pMap page for a cartoon of the pieces, and P2pOverview for a textual description of the processes depicted in the image.
The coordination of the peer to peer tools is accomplished by rendezvous through a SQL database. It has the following tables.
Maintains information about the SEED instances that are registered with the P2P system.
CREATE TABLE seed_registration ( seed_id text, display_name text, url text, last_active int(11) default NULL )
Holds P2P requests. requestor_seed_id is the SEED id of the system requesting a response from the responder, responder_seed_id.
the_query is a pickled Python data structure containing the actual query. The P2P system neither knows nor cares about the internal structure of the request.
file_name is the file on the clearinghouse filesystem that contains the result of the request. It is written by the upload_fulfilled CGI on the clearinghouse.
CREATE TABLE seed_query ( id int(11) NOT NULL auto_increment, requestor_seed_id text, responder_seed_id text, the_query blob, request_time int(11) default NULL, query_status text, file_name text, message text, PRIMARY KEY (id) )
CREATE TABLE news ( news_id int(11) NOT NULL auto_increment, add_date int(11) default NULL, target text, teaser text, news_item text, PRIMARY KEY (news_id) )
Presents the toplevel client interface to the P2P system.
User is presented with a list of seed instances and a set of available operations.
The seed instance list is created by invoking SEED_registration_db.cgi with arguments
function: "get_list"
group: group name
It returns a tab-delimited list of lines
seed_id
display name
seed URL
last activity time
The generated form invokes new_p2p_packager.cgi with the following arguments.
package_thing: Type of request being made. Currently one of
Lightweight_Code, Translation_Rules, Annotations, Assignments
source: seed ID of the SEED instance to be queried.
Process the following arguments:
install: a string containing "Install query %s" where %s is query ID of package to install
(this comes from a submit button where the query id embedded in the label)
Invokes download_a_query.cgi on the clearingouse with the following arguments:
query_id: query id to download
The download_a_query returns a URL which is then retrieved. The contents of that URL are unpickled and expected to be a list of dictionaries, of which only the first is examined. The following keys are retrieved from the request dict:
file_name: Name of the file on the clearinghouse for this package.
name: Type of package this is (translation_rules, annotation, assignments, etc).
A file url is constructed using the basename of the file_name value from the request. That file is retrieved, and the SEED routine install_<package type> is invoked. If the install succeeds, we update the query status to be "installed", otherwise it is updated as "install failed". The status is written back to the clearinghouse using the set_status.cgi script on the clearinghouse with the following arguments:
query_id: query id to be updated
query_status: new status for the query
message: message to be written with teh status
The result of the install is presented to the user, and the script exits.
The value of the delete argument is used as a query id, and the delete_pending.cgi script is invoked on the clearinghouse with the following arguments set:
query_id: query to delete
The browser is redirected to the new_seed_update_page page and the script exits.
We process the following arguments:
source: SEED id to take update from (actually seed_id <tab> name)
package_thing: type of package we are requesting
Based on value of package_thing, we process differently.
organisms: List of organisms to pull assignments from
user: User to pull assignmets from
who: User to ???
date: Date after which assignments are taken
If any of organisms, user, or date is missing, make_assignments_page() is invoked to return a form that allows the user to choose these items. An argument list is constructed with the following values:
user: username
who: who name
date: date to pull assignments from
organisms: list of organisms
organisms: List of organisms to pull assignments from
who: User to ???
date: Date after which assignments are taken
If any of organisms, who, or date is missing, make_annotations_page() is invoked to return a form that allows the user to choose these items. An argument list is constructed with the following values:
who: who name
date: date to pull assignments from
organisms: list of organisms
In any case, we add the following arguments:
requestor_seed_id: This SEED's id
responder_seed_id: seed id we are requesting data from
source: seed id <tab> seed name
package_thing: type of package requested
The upload_request.cgi script is then invoked on the clearinghouse, and the status printed to the user.
Clearinghouse CGI. Takes the following arguments:
package_thing: installation type
responder_seed_id:
requestor_seed_id:
source:
user:
who:
date:
organisms:
Each argument's value above is stored in a Python dict (the "request"). The request is written to the seed_query table in the database with an initial status of "pending" and an empty filename.
Clearinghouse CGI. Accepts arguments
query_id: query id to update
query_status: new status value
message: message to tag query with
and updates the status of the specified query.
Clearinghouse CGI. Accepts arguments
query_id: query to download
Writes a temp file with a pickled list of queries, where each query is a dict with keys
query_id: the query id
query_status: current status
file_name: result filename
message: output message
Returns the URL to the temp file.
Deletes a query. Arguments:
query_id: queyr to delete
Periodically invokes SEED_registration.cgi on the clearinghouse with arguments
function: register
group: peer group (if found in seed_peer_group.cfg, otherwise None)
name: hostname
url: local FIG url
Updates the SEED.reg flatfile (python pickle) with information.
Periodically invokes do_register, process_updates, and do_get_news.
do_register() invokes SEED_registration_db_news.cgi on the clearinghouse, with arguments
seed_id = this seed's ID (a uuid)
function = register
group = peer group (if found in seed_peer_group.cfg, otherwise None)
name = hostname
url = local FIG url
process_updates() invokes download_for_responder.cgi on the clearinghouse, passing arguments
seed_id = this seed's ID
status = "pending"
If there are updates to process, download_for_responder returns a URL to a file containing a set of queries formatted as a python pickled data structure.
We download the query url, and extract the list of queries from the pickle.
An argument list is constructed from the request:
package_thing: value of the `name` entry
user: value of the `user` entry if present
who: value of the `who` entry if present
date: value of the `date` entry if present
organisms: value of the `organisms` entry if present
The getfilename.cgi script is invoked locally with these arguments. It returns either an error code or a URL. If it returns a URL, the contents of the URL are loaded, and that output uploaded to the clearinghouse using the upload_fulfilled CGI.
We determine the last news data item by reading the last_news file. Construct an argument list:
function: "get_news"
last_news: last news value
seed_id: local SEED id
Invoke the SEED_registration_db_news CGI on the clearinghouse. This returns a list of news items newer than last_news. Each new item is written to a set of files FIG/var/News/<id>.teaser, <id>.date, <id>.news. The last_news file is updated with the latest news item returned.
Accepts the following arguments:
function: "register", "get_list", "get_news"
Accepts the following arguments:
seed_id: ID of registering SEED
group:
url:
name:
Updates seed_registration table with information for this SEED.
Returns a set of tab-delimited lines of data, one for each SEED in the seed_registration table:
seed_id
display name
SEED url
last-active time (integer seconds since the epoch)
Accepts the following arguments:
seed_id: SEED id we are retrieving news for
last_news: id of last news item retrieved
Queries the news table for all items with ID greater than last_news and where the news target is either "ALL" or seed_id. Return is a list of tab-delimited news items:
news_id
date added to news
teaser line (URL quoted)
news item text (URL quoted)
Handles the uploading of a fulfilled request.
Accepts the following arguments:
file: CGI file upload protocol file contents
query_id: query id we are uploading results foor
Writes the uploaded file contents to a file, and updates the seed_query table entry for query_id to set the status to 'fulfilled' and the filename to the new filename.
This clearinghouse CGI is responsible for forwarding any pending requests for a given SEED to that SEED. It accepts the following arguments:
seed_id: seed id of requesting SEED
status: status of records we are interested in
Executes the following query in the clearinghouse database:
SELECT *
FROM seed_query
WHERE responder_seed_id = <seed_id> and query_status = <status>
if a status is passed, or
SELECT *
FROM seed_query
WHERE responder_seed_id = <seed_id>
otherwise.
Each row of the query result is used to construct a request dict. The request dict is initialized by unpickling the pickled query from the the_query column of the table. The query id, status, file_name, and message columns are inserted into the result dict.
The list of request dicts is pickled into a temporary file, and the URL of that file is returned to the client.
??Why is the pickled dict not just returned?
SEED CGI. Accepts the following arguments
package_thing: type of package to process
user:
who:
date
organisms: organism list
Execute a packaging command on the SEED. The command is constructed as
FIG/bin/package_<package type> <output_file>
Additional arguments are appended for user, who, and date; these take the form
argname=argvalue
For example, user=master:BobO date=01/01/2005
The organism list is passed as the final set of arguments to the script. If all organisms are desired, none are passed.
The command is run, writing its output to the specified output file. If an error occurred, output of "Error: <error message>" is written. Otherwise the URL to the output file is returned.