File:
[LON-CAPA] /
doc /
gutshtml /
SessionFou1.html
Revision
1.1:
download - view:
text,
annotated -
select for diffs
Fri Jun 28 20:30:29 2002 UTC (22 years, 5 months ago) by
www
Branches:
MAIN
CVS tags:
version_0_99_3,
version_0_99_2,
version_0_99_1,
version_0_99_0,
version_0_6_2,
version_0_6,
version_0_5_1,
version_0_5,
version_0_4,
stable_2002_july,
conference_2003,
STABLE,
HEAD
HTML version of GUTS manual. Individual files will still need cleanup.
<html>
<head>
<meta name=Title
content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
<meta http-equiv=Content-Type content="text/html; charset=macintosh">
<link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
<title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
Files) (Guy)</title>
<style><!--
.MsoHeader
{tab-stops:center 3.0in right 6.0in;
font-size:10.0pt;
font-family:"Times New Roman";}
.MsoPlainText
{font-size:10.0pt;
font-family:"Courier New";}
.Section1
{page:Section1;}
.Section2
{page:Section2;}
-->
</style>
</head>
<body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
<div class=Section1>
<h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
Files) (Guy)</h2>
<h3><a name="_Toc421867121">XML Files</a></h3>
<p><span style='color:black'>All HTML / XML files are run through the lonxml
handler before being served to a user. This allows us to rewrite many portion
of a document and to support serverside tags. There are 2 ways to add new
tags to the xml parsing engine, either through LON-CAPA style files or by
writing Perl tag handlers for the desired tags. </span></p>
<p><span style='color:black'><b>Global Variables</b></span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::debug</i></span><span
style='color:black'> - debugging control </span></p>
<p><span style='color:black'>*
<i>@Apache::lonxml::pwd</i></span><span
style='color:black'> - path to the directory containing the file currently being
processed </span></p>
<p><span style='color:black'>*
<i>@Apache::lonxml::outputstack</i></span><span
style='color:black'> </span></p>
<p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
style='color:black'> - these two are used for capturing a subset of the output
for later processing, don't touch them directly use &startredirection
and &endredirection </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::import</i></span><span
style='color:black'> - controls whether the <import> tag actually does anything
</span></p>
<p><span style='color:black'>*
<i>@Apache::lonxml::extlinks</i></span><span
style='color:black'> - a list of URLs that the user is allowed to look at because
of the current resource (images, and links) </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::metamode</i></span><span
style='color:black'> - some output is turned off, the meta target wants a specific
subset, use <output> to guarentee that the catianed data will be in
the parsing output </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::evaluate</i></span><span
style='color:black'> - controls whether run::evaluate actually derefences variable
references </span></p>
<p><span style='color:black'>*
<i>%Apache::lonxml::insertlist</i></span><span
style='color:black'> - data structure for edit mode, determines what tags can
go into what other tags </span></p>
<p><span style='color:black'>*
<i>@Apache::lonxml::namespace</i></span><span
style='color:black'> - stores the list of tag namespaces used in the insertlist.tab
file that are currently active, used only in edit mode. </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::registered</i></span><span
style='color:black'> - set to 1 once the remote has been updated to know what
resource we are looking at. </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::request</i></span><span
style='color:black'> - current Apache request object, or undef </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::curdepth</i></span><span
style='color:black'> - current depth of the overall parse depth. Will be a string
like: 2_3_1 (first tag in the third second level tag in the second toplevel
tag). It gets set by callsub, and can be used in Perl tag implementations.
It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
style='color:black'> </span></p>
<p><span style='color:black'>*
<i>$Apache::lonxml::prevent_entity_encode</i></span><span
style='color:black'> - By default the xmlparser will try to rencode any 8-bit
characters into HTMLEntity Codes, If this is set to a true value it will be
prevented. </span></p>
<p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
style='color:black'>, <i>$Apache::lonxml::import</i></span><span
style='color:black'>, should never be set to a value directly, but rather incremented
when you want the effect on, and decremented when you want the effect off.
</span></p>
<p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
<p><span style='color:black'>If not specified these functions are in Apache::lonxml
</span></p>
<p><span style='color:black'>*
<i>xmlparse</i></span><span
style='color:black'> - see the XMLPARSE figure - also not callable from inside
a tag, if one needs to restart parsing, either create add a new LCParser to
the parser stack parser using the newparser function, or call inner_xmlparser,
see the xmlparse function in scripttag.pm </span></p>
<p><span style='color:black'>*
<i>recurse</i></span><span
style='color:black'> - acts just like <i>xmlparse</i></span><span
style='color:black'>, except it doesn't do the style definition check it always
calls <i>callsub</i></span><span style='color:black'> </span></p>
<p><span style='color:black'>*
<i>callsub</i></span><span
style='color:black'> - callsub looks if a perl subroutine is defined for the current
tag and calls. Otherwise it just returns the tag as it was read in. It also
will throw on a default editing interface unless the tag has a defined subroutine
that either returns something or requests that call sub not add the editing
interface. </span></p>
<p><span style='color:black'>*
<i>afterburn</i></span><span
style='color:black'> - called on the output of xmlparse, it can add highlights,
anchors, and links to regular expersion matches to the output. </span></p>
<p><span style='color:black'>*
<i>register_insert</i></span><span
style='color:black'> - builds the %Apache::lonxml::insertlist structure of what
tags can have what other tags inside. </span></p>
<p><span style='color:black'>*
<i>whichuser</i></span><span
style='color:black'> - returns a list of $symb, $courseid, $domain, $name that
is correct for calls to lonnet functions for this setup. Uses form.grade_
parameters, if the user is allowed to mgr in the course </span></p>
<p><span style='color:black'>*
<i>setup_globals</i></span><span
style='color:black'> - initializes all lonxml globals when xmlparse is called.
If you intend to create a new target you will likely need to tweak how the
globals are setup upon start up. </span></p>
<p><span style='color:black'>*
<i>init_safespace</i></span><span
style='color:black'> - creates Holes to external functions, creates some global
variables, and set the permitted operators of the global Safespace intepreter.
</span></p>
<p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
<p><span style='color:black'>If not specified these functions are in Apache::lonxml
</span></p>
<p><span style='color:black'>*
<i>debug</i></span><span
style='color:black'> - a function to call to printout debugging messages. Will
only print when Apache::lonxml::debug is set to 1 </span></p>
<p><span style='color:black'>*
<i>warning</i></span><span
style='color:black'> - a function to use for warning messages. The message will
appear at the top of a resource when it is viewed in construction space only.
</span></p>
<p><span style='color:black'>*
<i>error</i></span><span
style='color:black'> - a function to use for error messages. The message will
appear at the top of a resource when it is viewed in construction space, and
will message the resource author and course instructor, while informing the
student that an error has occured otherwise. </span></p>
<p><span style='color:black'>*
<i>get_all_text</i></span><span
style='color:black'> - 2 args, tag to look for (need to use /tag to look for an
end tag) and a HTML::TokeParser reference, it will repedelyt get text from
the TokeParser until the requested tag is found. It will return all of the
document it pulled form the TokeParser. (See Apache::scripttag::start_script
for an example of usage.) </span></p>
<p><span style='color:black'>*
<i>get_param</i></span><span
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
second is a reference to the parser arguments stack, third is a reference
to the Safe space, and fourth is an optional "context" value. This
subroutine allows a tag to get a tag argument, after being interpolated inside
the Safe space. This should be used if the tag might use a safe space variable
reference for the tag argument. (See Apache::scripttag::start_script for an
example.) This version only handles scalar variables. </span></p>
<p><span style='color:black'>*
<i>get_param_var</i></span><span
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
second is a reference to the parser arguments stack, third is a reference
to the Safe space, and fourth is an optional "context" value. This
subroutine allows a tag to get a tag argument, after being interpolated inside
the Safe space. This should be used if the tag might use a safe space variable
reference for the tag argument. (See Apache::scripttag::start_script for an
example.) This version can handle list or hash variables properly. </span></p>
<p><span style='color:black'>*
<i>description</i></span><span
style='color:black'> - 1 argument, the token object. This will return the textual
decription of the current tag from the insertlist.tab file. </span></p>
<p><span style='color:black'>*
<i>whichuser</i></span><span
style='color:black'> - 0 arguments. This will take a look at the current environment
setting and return the current $symb, $courseid, $udom, $uname. You should
always use this function if you want to determine who the current user is.
(Since a instructor might be trying to view a students version of a resource.)
</span></p>
<p><span style='color:black'>*
<i>inner_xmlparse</i></span><span
style='color:black'> - 6 arguments, the target, an array pointer to the current
stack of tags, and array pointer to the current stack of tag arguments, an
array pointer to the current stack of LCParser's, a pointer to the current
Safe space, a pointer to the hash of current style definitions </span></p>
<p><span style='color:black'>*
<i>newparser</i></span><span
style='color:black'> - 3 args, first is a reference to the parser stack, second
should be a reference to a string scaler containg the text the newparser should
run over, third should be a scaler of the directory path the file the parser
is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
<p><span style='color:black'>*
<i>register</i></span><span
style='color:black'> - should be called in a file's BEGIN block. 2 arguments,
a scaler string, and a list of strings. This allows a file to register what
tags it handles, and what the namespace of those tags are. Example: </span></p>
<p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
<p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
<p><span style='font-family:"Courier New";color:black'>}</span></p>
<p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it
can find handlers for <script> and <display>, if one regsiters
a tag that was already registered the previous one is remembered and will
be restored on a deregister. </span></p>
<p><span style='color:black'>*
<i>deregister</i></span><span
style='color:black'> - used to remove a previously registered tag implementation.
It will restore the previous registration if there was one. </span></p>
<p><span style='color:black'>*
<i>startredirection</i></span><span
style='color:black'> - used when a tag wants to save a portion of the document
for its end tag to use, but wants the intervening document to be normally
processed. (See Apache::scripttag::start_window for an example.) </span></p>
<p><span style='color:black'>*
<i>endredirection</i></span><span
style='color:black'> - used to stop preventing xmlparse from hiding output. The
return value is everthing that xmlparse has processed since the corresponding
startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
<p><span style='color:black'>*
<i>Apache::run::evaluate</i></span><span
style='color:black'> - 3 args, first a string, second a reference to the Safe
space, 3 a string to be evaluated before the first arg. This subroutine will
do variable interpolation and simple function interpolations on the first
argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
<p><span style='color:black'>*
<i>Apache::run::run</i></span><span
style='color:black'> - 2 args, first a string, second a reference to the Safe
space. This handles passing the passed string into the Safe space for evaluation
and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
<h3><a name="_Toc421867122">Style Files</a></h3>
<p><span style='color:black'> <img width=432 height=255
src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
<p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
<p><span style='color:black'><b>Style File specific tags</b></span></p>
<p><span style='color:black'><b><definetag></b></span><span
style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'>
name of new tag being defined, if proceeded with a / defining an end tag,
required; <i>parms</i></span><span style='color:black'> parameters of the
new tag, the value of these parameters can be accesed by $parametername. </span></p>
<p><span style='color:black'>*
<b><render></b></span><span
style='color:black'> - define what the new tag does for a non meta target </span></p>
<p><span style='color:black'>*
<b><meta></b></span><span
style='color:black'> - define what the new tag does for a meta target </span></p>
<p><span style='color:black'>*
<b><tex> / <web> / <latexsource></b></span><span style='color:black'>
- define what a new tag does for a specific no meta target, all data inside
a <render> is render to all targets except when surrounded by a specific
target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
<p class=MsoHeader> <img width=432 height=243
src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
<p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
style='font-size:14.0pt'> Ð The parser</span></p>
<h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
<p class=MsoPlainText>SYNOPSIS</p>
<p class=MsoPlainText> require HTML::LCParser;</p>
<p class=MsoPlainText> $p = HTML::LCParser->new("index.html")
|| die "Can't open: $!";</p>
<p class=MsoPlainText> while (my $token = $p->get_token) {</p>
<p class=MsoPlainText> #...</p>
<p class=MsoPlainText> }</p>
<p class=MsoPlainText>DESCRIPTION</p>
<p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface
to the</p>
<p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser>
subclass.</p>
<p class=MsoPlainText>The following methods are available:</p>
<p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p>
<p class=MsoPlainText>The object constructor argument is either a file name,
a file handle</p>
<p class=MsoPlainText>object, or the complete document to be parsed.</p>
<p class=MsoPlainText>If the argument is a plain scalar, then it is taken as
the name of a</p>
<p class=MsoPlainText>file to be opened and parsed. If the file can't
be opened for</p>
<p class=MsoPlainText>reading, then the constructor will return an undefined
value and $!</p>
<p class=MsoPlainText>will tell you why it failed.</p>
<p class=MsoPlainText>If the argument is a reference to a plain scalar, then
this scalar is</p>
<p class=MsoPlainText>taken to be the literal document to parse. The value
of this</p>
<p class=MsoPlainText>scalar should not be changed before all tokens have been
extracted.</p>
<p class=MsoPlainText>Otherwise the argument is taken to be some object that
the</p>
<p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs
more data. Typically</p>
<p class=MsoPlainText>it will be a filehandle of some kind. The stream
will be read() until</p>
<p class=MsoPlainText>EOF, but not closed.</p>
<p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
<p class=MsoPlainText>* $p->get_token</p>
<p class=MsoPlainText>This method will return the next I<token> found
in the HTML document,</p>
<p class=MsoPlainText>or C<undef> at the end of the document. The
token is returned as an</p>
<p class=MsoPlainText>array reference. The first element of the array
will be a (mostly)</p>
<p class=MsoPlainText>single character string denoting the type of this token:
"S" for start</p>
<p class=MsoPlainText>tag, "E" for end tag, "T" for text,
"C" for comment, "D" for</p>
<p class=MsoPlainText>declaration, and "PI" for process instructions.
The rest of the array</p>
<p class=MsoPlainText>is the same as the arguments passed to the corresponding
HTML::Parser</p>
<p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>).
In summary, returned</p>
<p class=MsoPlainText>tokens look like this:</p>
<p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text,
$line]</p>
<p class=MsoPlainText> ["E", $tag, $text, $line]</p>
<p class=MsoPlainText> ["T", $text, $is_data, $line]</p>
<p class=MsoPlainText> ["C", $text, $line]</p>
<p class=MsoPlainText> ["D", $text, $line]</p>
<p class=MsoPlainText> ["PI", $token0, $text, $line]</p>
<p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array
reference and</p>
<p class=MsoPlainText>the rest are plain scalars.</p>
<p class=MsoPlainText>* $p->unget_token($token,...)</p>
<p class=MsoPlainText>If you find out you have read too many tokens you can
push them back,</p>
<p class=MsoPlainText>so that they are returned the next time $p->get_token
is called.</p>
<p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p>
<p class=MsoPlainText>This method returns the next start or end tag (skipping
any other</p>
<p class=MsoPlainText>tokens), or C<undef> if there are no more tags in
the document. If</p>
<p class=MsoPlainText>one or more arguments are given, then we skip tokens until
one of the</p>
<p class=MsoPlainText>specified tag types is found. For example:</p>
<p class=MsoPlainText> $p->get_tag("font", "/font");</p>
<p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
<p class=MsoPlainText>The tag information is returned as an array reference
in the same form</p>
<p class=MsoPlainText>as for $p->get_token above, but the type code (first
element) is</p>
<p class=MsoPlainText>missing. A start tag will be returned like this:</p>
<p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p>
<p class=MsoPlainText>The tagname of end tags are prefixed with "/",
i.e. end tag is</p>
<p class=MsoPlainText>returned like this:</p>
<p class=MsoPlainText> ["/$tag", $text]</p>
<p class=MsoPlainText>* $p->get_text( [$endtag] )</p>
<p class=MsoPlainText>This method returns all text found at the current position.
It will</p>
<p class=MsoPlainText>return a zero length string if the next token is not text.
The</p>
<p class=MsoPlainText>optional $endtag argument specifies that any text occurring
before the</p>
<p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
<p class=MsoPlainText>The $p->{textify} attribute is a hash that defines
how certain tags can</p>
<p class=MsoPlainText>be treated as text. If the name of a start tag matches
a key in this</p>
<p class=MsoPlainText>hash then this tag is converted to text. The hash
value is used to</p>
<p class=MsoPlainText>specify which tag attribute to obtain the text from.
If this tag</p>
<p class=MsoPlainText>attribute is missing, then the upper case name of the
tag enclosed in</p>
<p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The
hash value can also be a</p>
<p class=MsoPlainText>subroutine reference. In this case the routine is
called with the</p>
<p class=MsoPlainText>start tag token content as its argument and the return
value is treated</p>
<p class=MsoPlainText>as the text.</p>
<p class=MsoPlainText>The default $p->{textify} value is:</p>
<p class=MsoPlainText> {img => "alt", applet => "alt"}</p>
<p class=MsoPlainText>This means that <IMG> and <APPLET> tags are
treated as text, and that</p>
<p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
<p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p>
<p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences
of white</p>
<p class=MsoPlainText>space to a single space character. Leading and trailing
white space is</p>
<p class=MsoPlainText>removed.</p>
<p class=MsoPlainText>EXAMPLES</p>
<p class=MsoPlainText>This example extracts all links from a document.
It will print one</p>
<p class=MsoPlainText>line for each link, containing the URL and the textual
description</p>
<p class=MsoPlainText>between the <A>...</A> tags:</p>
<p class=MsoPlainText> use HTML::LCParser;</p>
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
<p class=MsoPlainText> while (my $token = $p->get_tag("a"))
{</p>
<p class=MsoPlainText> my $url = $token->[1]{href}
|| "-";</p>
<p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p>
<p class=MsoPlainText> print "$url\t$text\n";</p>
<p class=MsoPlainText> }</p>
<p class=MsoPlainText>This example extract the <TITLE> from the document:</p>
<p class=MsoPlainText> use HTML::LCParser;</p>
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
<p class=MsoPlainText> if ($p->get_tag("title")) {</p>
<p class=MsoPlainText> my $title = $p->get_trimmed_text;</p>
<p class=MsoPlainText> print "Title: $title\n";</p>
<p class=MsoPlainText> }</p>
</div>
<br
clear=ALL style='page-break-before:always;'>
<div class=Section2> </div>
</body>
</html>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>