version 1.1, 2002/06/28 20:30:29
|
version 1.2, 2003/07/22 14:47:00
|
Line 1
|
Line 1
|
<html>
|
<html> |
<head>
|
|
<meta name=Title
|
<head> |
content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
|
|
<meta http-equiv=Content-Type content="text/html; charset=macintosh">
|
<meta name=Title |
<link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
|
|
<title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
|
content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)"> |
Files) (Guy)</title>
|
|
<style><!--
|
<meta http-equiv=Content-Type content="text/html; charset=macintosh"> |
.MsoHeader
|
|
{tab-stops:center 3.0in right 6.0in;
|
<link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso"> |
font-size:10.0pt;
|
|
font-family:"Times New Roman";}
|
<title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style |
.MsoPlainText
|
|
{font-size:10.0pt;
|
Files) (Guy)</title> |
font-family:"Courier New";}
|
|
.Section1
|
<style><!-- |
{page:Section1;}
|
|
.Section2
|
.MsoHeader |
{page:Section2;}
|
|
-->
|
{tab-stops:center 3.0in right 6.0in; |
</style>
|
|
</head>
|
font-size:10.0pt; |
<body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
|
|
<div class=Section1>
|
font-family:"Times New Roman";} |
<h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
|
|
Files) (Guy)</h2>
|
.MsoPlainText |
<h3><a name="_Toc421867121">XML Files</a></h3>
|
|
<p><span style='color:black'>All HTML / XML files are run through the lonxml
|
{font-size:10.0pt; |
handler before being served to a user. This allows us to rewrite many portion
|
|
of a document and to support serverside tags. There are 2 ways to add new
|
font-family:"Courier New";} |
tags to the xml parsing engine, either through LON-CAPA style files or by
|
|
writing Perl tag handlers for the desired tags. </span></p>
|
.Section1 |
<p><span style='color:black'><b>Global Variables</b></span></p>
|
|
<p><span style='color:black'>*
|
{page:Section1;} |
<i>$Apache::lonxml::debug</i></span><span
|
|
style='color:black'> - debugging control </span></p>
|
.Section2 |
<p><span style='color:black'>*
|
|
<i>@Apache::lonxml::pwd</i></span><span
|
{page:Section2;} |
style='color:black'> - path to the directory containing the file currently being
|
|
processed </span></p>
|
--> |
<p><span style='color:black'>*
|
|
<i>@Apache::lonxml::outputstack</i></span><span
|
</style> |
style='color:black'> </span></p>
|
|
<p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
|
</head> |
style='color:black'> - these two are used for capturing a subset of the output
|
|
for later processing, don't touch them directly use &startredirection
|
<body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US> |
and &endredirection </span></p>
|
|
<p><span style='color:black'>*
|
<div class=Section1> |
<i>$Apache::lonxml::import</i></span><span
|
|
style='color:black'> - controls whether the <import> tag actually does anything
|
<h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style |
</span></p>
|
|
<p><span style='color:black'>*
|
Files) (Guy)</h2> |
<i>@Apache::lonxml::extlinks</i></span><span
|
|
style='color:black'> - a list of URLs that the user is allowed to look at because
|
<h3><a name="_Toc421867121">XML Files</a></h3> |
of the current resource (images, and links) </span></p>
|
|
<p><span style='color:black'>*
|
<p><span style='color:black'>All HTML / XML files are run through the lonxml |
<i>$Apache::lonxml::metamode</i></span><span
|
|
style='color:black'> - some output is turned off, the meta target wants a specific
|
handler before being served to a user. This allows us to rewrite many portion |
subset, use <output> to guarentee that the catianed data will be in
|
|
the parsing output </span></p>
|
of a document and to support serverside tags. There are 2 ways to add new |
<p><span style='color:black'>*
|
|
<i>$Apache::lonxml::evaluate</i></span><span
|
tags to the xml parsing engine, either through LON-CAPA style files or by |
style='color:black'> - controls whether run::evaluate actually derefences variable
|
|
references </span></p>
|
writing Perl tag handlers for the desired tags. </span></p> |
<p><span style='color:black'>*
|
|
<i>%Apache::lonxml::insertlist</i></span><span
|
<p><span style='color:black'><b>Global Variables</b></span></p> |
style='color:black'> - data structure for edit mode, determines what tags can
|
|
go into what other tags </span></p>
|
<p><span style='color:black'>* |
<p><span style='color:black'>*
|
|
<i>@Apache::lonxml::namespace</i></span><span
|
<i>$Apache::lonxml::debug</i></span><span |
style='color:black'> - stores the list of tag namespaces used in the insertlist.tab
|
|
file that are currently active, used only in edit mode. </span></p>
|
style='color:black'> - debugging control </span></p> |
<p><span style='color:black'>*
|
|
<i>$Apache::lonxml::registered</i></span><span
|
<p><span style='color:black'>* |
style='color:black'> - set to 1 once the remote has been updated to know what
|
|
resource we are looking at. </span></p>
|
<i>@Apache::lonxml::pwd</i></span><span |
<p><span style='color:black'>*
|
|
<i>$Apache::lonxml::request</i></span><span
|
style='color:black'> - path to the directory containing the file currently being |
style='color:black'> - current Apache request object, or undef </span></p>
|
|
<p><span style='color:black'>*
|
processed </span></p> |
<i>$Apache::lonxml::curdepth</i></span><span
|
|
style='color:black'> - current depth of the overall parse depth. Will be a string
|
<p><span style='color:black'>* |
like: 2_3_1 (first tag in the third second level tag in the second toplevel
|
|
tag). It gets set by callsub, and can be used in Perl tag implementations.
|
<i>@Apache::lonxml::outputstack</i></span><span |
It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
|
|
style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
|
style='color:black'> </span></p> |
style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
|
|
style='color:black'> </span></p>
|
<p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span |
<p><span style='color:black'>*
|
|
<i>$Apache::lonxml::prevent_entity_encode</i></span><span
|
style='color:black'> - these two are used for capturing a subset of the output |
style='color:black'> - By default the xmlparser will try to rencode any 8-bit
|
|
characters into HTMLEntity Codes, If this is set to a true value it will be
|
for later processing, don't touch them directly use &startredirection |
prevented. </span></p>
|
|
<p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
|
and &endredirection </span></p> |
style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
|
|
style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
|
<p><span style='color:black'>* |
style='color:black'>, <i>$Apache::lonxml::import</i></span><span
|
|
style='color:black'>, should never be set to a value directly, but rather incremented
|
<i>$Apache::lonxml::import</i></span><span |
when you want the effect on, and decremented when you want the effect off.
|
|
</span></p>
|
style='color:black'> - controls whether the <import> tag actually does anything |
<p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
|
|
<p><span style='color:black'>If not specified these functions are in Apache::lonxml
|
</span></p> |
</span></p>
|
|
<p><span style='color:black'>*
|
<p><span style='color:black'>* |
<i>xmlparse</i></span><span
|
|
style='color:black'> - see the XMLPARSE figure - also not callable from inside
|
<i>@Apache::lonxml::extlinks</i></span><span |
a tag, if one needs to restart parsing, either create add a new LCParser to
|
|
the parser stack parser using the newparser function, or call inner_xmlparser,
|
style='color:black'> - a list of URLs that the user is allowed to look at because |
see the xmlparse function in scripttag.pm </span></p>
|
|
<p><span style='color:black'>*
|
of the current resource (images, and links) </span></p> |
<i>recurse</i></span><span
|
|
style='color:black'> - acts just like <i>xmlparse</i></span><span
|
<p><span style='color:black'>* |
style='color:black'>, except it doesn't do the style definition check it always
|
|
calls <i>callsub</i></span><span style='color:black'> </span></p>
|
<i>$Apache::lonxml::metamode</i></span><span |
<p><span style='color:black'>*
|
|
<i>callsub</i></span><span
|
style='color:black'> - some output is turned off, the meta target wants a specific |
style='color:black'> - callsub looks if a perl subroutine is defined for the current
|
|
tag and calls. Otherwise it just returns the tag as it was read in. It also
|
subset, use <output> to guarentee that the catianed data will be in |
will throw on a default editing interface unless the tag has a defined subroutine
|
|
that either returns something or requests that call sub not add the editing
|
the parsing output </span></p> |
interface. </span></p>
|
|
<p><span style='color:black'>*
|
<p><span style='color:black'>* |
<i>afterburn</i></span><span
|
|
style='color:black'> - called on the output of xmlparse, it can add highlights,
|
<i>$Apache::lonxml::evaluate</i></span><span |
anchors, and links to regular expersion matches to the output. </span></p>
|
|
<p><span style='color:black'>*
|
style='color:black'> - controls whether run::evaluate actually derefences variable |
<i>register_insert</i></span><span
|
|
style='color:black'> - builds the %Apache::lonxml::insertlist structure of what
|
references </span></p> |
tags can have what other tags inside. </span></p>
|
|
<p><span style='color:black'>*
|
<p><span style='color:black'>* |
<i>whichuser</i></span><span
|
|
style='color:black'> - returns a list of $symb, $courseid, $domain, $name that
|
<i>%Apache::lonxml::insertlist</i></span><span |
is correct for calls to lonnet functions for this setup. Uses form.grade_
|
|
parameters, if the user is allowed to mgr in the course </span></p>
|
style='color:black'> - data structure for edit mode, determines what tags can |
<p><span style='color:black'>*
|
|
<i>setup_globals</i></span><span
|
go into what other tags </span></p> |
style='color:black'> - initializes all lonxml globals when xmlparse is called.
|
|
If you intend to create a new target you will likely need to tweak how the
|
<p><span style='color:black'>* |
globals are setup upon start up. </span></p>
|
|
<p><span style='color:black'>*
|
<i>@Apache::lonxml::namespace</i></span><span |
<i>init_safespace</i></span><span
|
|
style='color:black'> - creates Holes to external functions, creates some global
|
style='color:black'> - stores the list of tag namespaces used in the insertlist.tab |
variables, and set the permitted operators of the global Safespace intepreter.
|
|
</span></p>
|
file that are currently active, used only in edit mode. </span></p> |
<p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
|
|
<p><span style='color:black'>If not specified these functions are in Apache::lonxml
|
<p><span style='color:black'>* |
</span></p>
|
|
<p><span style='color:black'>*
|
<i>$Apache::lonxml::registered</i></span><span |
<i>debug</i></span><span
|
|
style='color:black'> - a function to call to printout debugging messages. Will
|
style='color:black'> - set to 1 once the remote has been updated to know what |
only print when Apache::lonxml::debug is set to 1 </span></p>
|
|
<p><span style='color:black'>*
|
resource we are looking at. </span></p> |
<i>warning</i></span><span
|
|
style='color:black'> - a function to use for warning messages. The message will
|
<p><span style='color:black'>* |
appear at the top of a resource when it is viewed in construction space only.
|
|
</span></p>
|
<i>$Apache::lonxml::request</i></span><span |
<p><span style='color:black'>*
|
|
<i>error</i></span><span
|
style='color:black'> - current Apache request object, or undef </span></p> |
style='color:black'> - a function to use for error messages. The message will
|
|
appear at the top of a resource when it is viewed in construction space, and
|
<p><span style='color:black'>* |
will message the resource author and course instructor, while informing the
|
|
student that an error has occured otherwise. </span></p>
|
<i>$Apache::lonxml::curdepth</i></span><span |
<p><span style='color:black'>*
|
|
<i>get_all_text</i></span><span
|
style='color:black'> - current depth of the overall parse depth. Will be a string |
style='color:black'> - 2 args, tag to look for (need to use /tag to look for an
|
|
end tag) and a HTML::TokeParser reference, it will repedelyt get text from
|
like: 2_3_1 (first tag in the third second level tag in the second toplevel |
the TokeParser until the requested tag is found. It will return all of the
|
|
document it pulled form the TokeParser. (See Apache::scripttag::start_script
|
tag). It gets set by callsub, and can be used in Perl tag implementations. |
for an example of usage.) </span></p>
|
|
<p><span style='color:black'>*
|
It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span |
<i>get_param</i></span><span
|
|
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
|
style='color:black'>, <i>$Apache::lonxml::depth</i></span><span |
second is a reference to the parser arguments stack, third is a reference
|
|
to the Safe space, and fourth is an optional "context" value. This
|
style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span |
subroutine allows a tag to get a tag argument, after being interpolated inside
|
|
the Safe space. This should be used if the tag might use a safe space variable
|
style='color:black'> </span></p> |
reference for the tag argument. (See Apache::scripttag::start_script for an
|
|
example.) This version only handles scalar variables. </span></p>
|
<p><span style='color:black'>* |
<p><span style='color:black'>*
|
|
<i>get_param_var</i></span><span
|
<i>$Apache::lonxml::prevent_entity_encode</i></span><span |
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
|
|
second is a reference to the parser arguments stack, third is a reference
|
style='color:black'> - By default the xmlparser will try to rencode any 8-bit |
to the Safe space, and fourth is an optional "context" value. This
|
|
subroutine allows a tag to get a tag argument, after being interpolated inside
|
characters into HTMLEntity Codes, If this is set to a true value it will be |
the Safe space. This should be used if the tag might use a safe space variable
|
|
reference for the tag argument. (See Apache::scripttag::start_script for an
|
prevented. </span></p> |
example.) This version can handle list or hash variables properly. </span></p>
|
|
<p><span style='color:black'>*
|
<p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span |
<i>description</i></span><span
|
|
style='color:black'> - 1 argument, the token object. This will return the textual
|
style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span |
decription of the current tag from the insertlist.tab file. </span></p>
|
|
<p><span style='color:black'>*
|
style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span |
<i>whichuser</i></span><span
|
|
style='color:black'> - 0 arguments. This will take a look at the current environment
|
style='color:black'>, <i>$Apache::lonxml::import</i></span><span |
setting and return the current $symb, $courseid, $udom, $uname. You should
|
|
always use this function if you want to determine who the current user is.
|
style='color:black'>, should never be set to a value directly, but rather incremented |
(Since a instructor might be trying to view a students version of a resource.)
|
|
</span></p>
|
when you want the effect on, and decremented when you want the effect off. |
<p><span style='color:black'>*
|
|
<i>inner_xmlparse</i></span><span
|
</span></p> |
style='color:black'> - 6 arguments, the target, an array pointer to the current
|
|
stack of tags, and array pointer to the current stack of tag arguments, an
|
<p><span style='color:black'><b>Notable Perl subroutines</b></span></p> |
array pointer to the current stack of LCParser's, a pointer to the current
|
|
Safe space, a pointer to the hash of current style definitions </span></p>
|
<p><span style='color:black'>If not specified these functions are in Apache::lonxml |
<p><span style='color:black'>*
|
|
<i>newparser</i></span><span
|
</span></p> |
style='color:black'> - 3 args, first is a reference to the parser stack, second
|
|
should be a reference to a string scaler containg the text the newparser should
|
<p><span style='color:black'>* |
run over, third should be a scaler of the directory path the file the parser
|
|
is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
|
<i>xmlparse</i></span><span |
<p><span style='color:black'>*
|
|
<i>register</i></span><span
|
style='color:black'> - see the XMLPARSE figure - also not callable from inside |
style='color:black'> - should be called in a file's BEGIN block. 2 arguments,
|
|
a scaler string, and a list of strings. This allows a file to register what
|
a tag, if one needs to restart parsing, either create add a new LCParser to |
tags it handles, and what the namespace of those tags are. Example: </span></p>
|
|
<p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
|
the parser stack parser using the newparser function, or call inner_xmlparser, |
<p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
|
|
<p><span style='font-family:"Courier New";color:black'>}</span></p>
|
see the xmlparse function in scripttag.pm </span></p> |
<p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it
|
|
can find handlers for <script> and <display>, if one regsiters
|
<p><span style='color:black'>* |
a tag that was already registered the previous one is remembered and will
|
|
be restored on a deregister. </span></p>
|
<i>recurse</i></span><span |
<p><span style='color:black'>*
|
|
<i>deregister</i></span><span
|
style='color:black'> - acts just like <i>xmlparse</i></span><span |
style='color:black'> - used to remove a previously registered tag implementation.
|
|
It will restore the previous registration if there was one. </span></p>
|
style='color:black'>, except it doesn't do the style definition check it always |
<p><span style='color:black'>*
|
|
<i>startredirection</i></span><span
|
calls <i>callsub</i></span><span style='color:black'> </span></p> |
style='color:black'> - used when a tag wants to save a portion of the document
|
|
for its end tag to use, but wants the intervening document to be normally
|
<p><span style='color:black'>* |
processed. (See Apache::scripttag::start_window for an example.) </span></p>
|
|
<p><span style='color:black'>*
|
<i>callsub</i></span><span |
<i>endredirection</i></span><span
|
|
style='color:black'> - used to stop preventing xmlparse from hiding output. The
|
style='color:black'> - callsub looks if a perl subroutine is defined for the current |
return value is everthing that xmlparse has processed since the corresponding
|
|
startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
|
tag and calls. Otherwise it just returns the tag as it was read in. It also |
<p><span style='color:black'>*
|
|
<i>Apache::run::evaluate</i></span><span
|
will throw on a default editing interface unless the tag has a defined subroutine |
style='color:black'> - 3 args, first a string, second a reference to the Safe
|
|
space, 3 a string to be evaluated before the first arg. This subroutine will
|
that either returns something or requests that call sub not add the editing |
do variable interpolation and simple function interpolations on the first
|
|
argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
|
interface. </span></p> |
<p><span style='color:black'>*
|
|
<i>Apache::run::run</i></span><span
|
<p><span style='color:black'>* |
style='color:black'> - 2 args, first a string, second a reference to the Safe
|
|
space. This handles passing the passed string into the Safe space for evaluation
|
<i>afterburn</i></span><span |
and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
|
|
<h3><a name="_Toc421867122">Style Files</a></h3>
|
style='color:black'> - called on the output of xmlparse, it can add highlights, |
<p><span style='color:black'> <img width=432 height=255
|
|
src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
|
anchors, and links to regular expersion matches to the output. </span></p> |
<p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
|
|
style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
|
<p><span style='color:black'>* |
<p><span style='color:black'><b>Style File specific tags</b></span></p>
|
|
<p><span style='color:black'><b><definetag></b></span><span
|
<i>register_insert</i></span><span |
style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'>
|
|
name of new tag being defined, if proceeded with a / defining an end tag,
|
style='color:black'> - builds the %Apache::lonxml::insertlist structure of what |
required; <i>parms</i></span><span style='color:black'> parameters of the
|
|
new tag, the value of these parameters can be accesed by $parametername. </span></p>
|
tags can have what other tags inside. </span></p> |
<p><span style='color:black'>*
|
|
<b><render></b></span><span
|
<p><span style='color:black'>* |
style='color:black'> - define what the new tag does for a non meta target </span></p>
|
|
<p><span style='color:black'>*
|
<i>whichuser</i></span><span |
<b><meta></b></span><span
|
|
style='color:black'> - define what the new tag does for a meta target </span></p>
|
style='color:black'> - returns a list of $symb, $courseid, $domain, $name that |
<p><span style='color:black'>*
|
|
<b><tex> / <web> / <latexsource></b></span><span style='color:black'>
|
is correct for calls to lonnet functions for this setup. Uses form.grade_ |
- define what a new tag does for a specific no meta target, all data inside
|
|
a <render> is render to all targets except when surrounded by a specific
|
parameters, if the user is allowed to mgr in the course </span></p> |
target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
|
|
<p class=MsoHeader> <img width=432 height=243
|
<p><span style='color:black'>* |
src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
|
|
<p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
|
<i>setup_globals</i></span><span |
style='font-size:14.0pt'> Ð The parser</span></p>
|
|
<h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
|
style='color:black'> - initializes all lonxml globals when xmlparse is called. |
<p class=MsoPlainText>SYNOPSIS</p>
|
|
<p class=MsoPlainText> require HTML::LCParser;</p>
|
If you intend to create a new target you will likely need to tweak how the |
<p class=MsoPlainText> $p = HTML::LCParser->new("index.html")
|
|
|| die "Can't open: $!";</p>
|
globals are setup upon start up. </span></p> |
<p class=MsoPlainText> while (my $token = $p->get_token) {</p>
|
|
<p class=MsoPlainText> #...</p>
|
<p><span style='color:black'>* |
<p class=MsoPlainText> }</p>
|
|
<p class=MsoPlainText>DESCRIPTION</p>
|
<i>init_safespace</i></span><span |
<p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface
|
|
to the</p>
|
style='color:black'> - creates Holes to external functions, creates some global |
<p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser>
|
|
subclass.</p>
|
variables, and set the permitted operators of the global Safespace intepreter. |
<p class=MsoPlainText>The following methods are available:</p>
|
|
<p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p>
|
</span></p> |
<p class=MsoPlainText>The object constructor argument is either a file name,
|
|
a file handle</p>
|
<p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p> |
<p class=MsoPlainText>object, or the complete document to be parsed.</p>
|
|
<p class=MsoPlainText>If the argument is a plain scalar, then it is taken as
|
<p><span style='color:black'>If not specified these functions are in Apache::lonxml |
the name of a</p>
|
|
<p class=MsoPlainText>file to be opened and parsed. If the file can't
|
</span></p> |
be opened for</p>
|
|
<p class=MsoPlainText>reading, then the constructor will return an undefined
|
<p><span style='color:black'>* |
value and $!</p>
|
|
<p class=MsoPlainText>will tell you why it failed.</p>
|
<i>debug</i></span><span |
<p class=MsoPlainText>If the argument is a reference to a plain scalar, then
|
|
this scalar is</p>
|
style='color:black'> - a function to call to printout debugging messages. Will |
<p class=MsoPlainText>taken to be the literal document to parse. The value
|
|
of this</p>
|
only print when Apache::lonxml::debug is set to 1 </span></p> |
<p class=MsoPlainText>scalar should not be changed before all tokens have been
|
|
extracted.</p>
|
<p><span style='color:black'>* |
<p class=MsoPlainText>Otherwise the argument is taken to be some object that
|
|
the</p>
|
<i>warning</i></span><span |
<p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs
|
|
more data. Typically</p>
|
style='color:black'> - a function to use for warning messages. The message will |
<p class=MsoPlainText>it will be a filehandle of some kind. The stream
|
|
will be read() until</p>
|
appear at the top of a resource when it is viewed in construction space only. |
<p class=MsoPlainText>EOF, but not closed.</p>
|
|
<p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
|
</span></p> |
<p class=MsoPlainText>* $p->get_token</p>
|
|
<p class=MsoPlainText>This method will return the next I<token> found
|
<p><span style='color:black'>* |
in the HTML document,</p>
|
|
<p class=MsoPlainText>or C<undef> at the end of the document. The
|
<i>error</i></span><span |
token is returned as an</p>
|
|
<p class=MsoPlainText>array reference. The first element of the array
|
style='color:black'> - a function to use for error messages. The message will |
will be a (mostly)</p>
|
|
<p class=MsoPlainText>single character string denoting the type of this token:
|
appear at the top of a resource when it is viewed in construction space, and |
"S" for start</p>
|
|
<p class=MsoPlainText>tag, "E" for end tag, "T" for text,
|
will message the resource author and course instructor, while informing the |
"C" for comment, "D" for</p>
|
|
<p class=MsoPlainText>declaration, and "PI" for process instructions.
|
student that an error has occured otherwise. </span></p> |
The rest of the array</p>
|
|
<p class=MsoPlainText>is the same as the arguments passed to the corresponding
|
<p><span style='color:black'>* |
HTML::Parser</p>
|
|
<p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>).
|
<i>get_all_text</i></span><span |
In summary, returned</p>
|
|
<p class=MsoPlainText>tokens look like this:</p>
|
style='color:black'> - 2 args, tag to look for (need to use /tag to look for an |
<p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text,
|
|
$line]</p>
|
end tag) and a HTML::TokeParser reference, it will repedelyt get text from |
<p class=MsoPlainText> ["E", $tag, $text, $line]</p>
|
|
<p class=MsoPlainText> ["T", $text, $is_data, $line]</p>
|
the TokeParser until the requested tag is found. It will return all of the |
<p class=MsoPlainText> ["C", $text, $line]</p>
|
|
<p class=MsoPlainText> ["D", $text, $line]</p>
|
document it pulled form the TokeParser. (See Apache::scripttag::start_script |
<p class=MsoPlainText> ["PI", $token0, $text, $line]</p>
|
|
<p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array
|
for an example of usage.) </span></p> |
reference and</p>
|
|
<p class=MsoPlainText>the rest are plain scalars.</p>
|
<p><span style='color:black'>* |
<p class=MsoPlainText>* $p->unget_token($token,...)</p>
|
|
<p class=MsoPlainText>If you find out you have read too many tokens you can
|
<i>get_param</i></span><span |
push them back,</p>
|
|
<p class=MsoPlainText>so that they are returned the next time $p->get_token
|
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, |
is called.</p>
|
|
<p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p>
|
second is a reference to the parser arguments stack, third is a reference |
<p class=MsoPlainText>This method returns the next start or end tag (skipping
|
|
any other</p>
|
to the Safe space, and fourth is an optional "context" value. This |
<p class=MsoPlainText>tokens), or C<undef> if there are no more tags in
|
|
the document. If</p>
|
subroutine allows a tag to get a tag argument, after being interpolated inside |
<p class=MsoPlainText>one or more arguments are given, then we skip tokens until
|
|
one of the</p>
|
the Safe space. This should be used if the tag might use a safe space variable |
<p class=MsoPlainText>specified tag types is found. For example:</p>
|
|
<p class=MsoPlainText> $p->get_tag("font", "/font");</p>
|
reference for the tag argument. (See Apache::scripttag::start_script for an |
<p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
|
|
<p class=MsoPlainText>The tag information is returned as an array reference
|
example.) This version only handles scalar variables. </span></p> |
in the same form</p>
|
|
<p class=MsoPlainText>as for $p->get_token above, but the type code (first
|
<p><span style='color:black'>* |
element) is</p>
|
|
<p class=MsoPlainText>missing. A start tag will be returned like this:</p>
|
<i>get_param_var</i></span><span |
<p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p>
|
|
<p class=MsoPlainText>The tagname of end tags are prefixed with "/",
|
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, |
i.e. end tag is</p>
|
|
<p class=MsoPlainText>returned like this:</p>
|
second is a reference to the parser arguments stack, third is a reference |
<p class=MsoPlainText> ["/$tag", $text]</p>
|
|
<p class=MsoPlainText>* $p->get_text( [$endtag] )</p>
|
to the Safe space, and fourth is an optional "context" value. This |
<p class=MsoPlainText>This method returns all text found at the current position.
|
|
It will</p>
|
subroutine allows a tag to get a tag argument, after being interpolated inside |
<p class=MsoPlainText>return a zero length string if the next token is not text.
|
|
The</p>
|
the Safe space. This should be used if the tag might use a safe space variable |
<p class=MsoPlainText>optional $endtag argument specifies that any text occurring
|
|
before the</p>
|
reference for the tag argument. (See Apache::scripttag::start_script for an |
<p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
|
|
<p class=MsoPlainText>The $p->{textify} attribute is a hash that defines
|
example.) This version can handle list or hash variables properly. </span></p> |
how certain tags can</p>
|
|
<p class=MsoPlainText>be treated as text. If the name of a start tag matches
|
<p><span style='color:black'>* |
a key in this</p>
|
|
<p class=MsoPlainText>hash then this tag is converted to text. The hash
|
<i>description</i></span><span |
value is used to</p>
|
|
<p class=MsoPlainText>specify which tag attribute to obtain the text from.
|
style='color:black'> - 1 argument, the token object. This will return the textual |
If this tag</p>
|
|
<p class=MsoPlainText>attribute is missing, then the upper case name of the
|
decription of the current tag from the insertlist.tab file. </span></p> |
tag enclosed in</p>
|
|
<p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The
|
<p><span style='color:black'>* |
hash value can also be a</p>
|
|
<p class=MsoPlainText>subroutine reference. In this case the routine is
|
<i>whichuser</i></span><span |
called with the</p>
|
|
<p class=MsoPlainText>start tag token content as its argument and the return
|
style='color:black'> - 0 arguments. This will take a look at the current environment |
value is treated</p>
|
|
<p class=MsoPlainText>as the text.</p>
|
setting and return the current $symb, $courseid, $udom, $uname. You should |
<p class=MsoPlainText>The default $p->{textify} value is:</p>
|
|
<p class=MsoPlainText> {img => "alt", applet => "alt"}</p>
|
always use this function if you want to determine who the current user is. |
<p class=MsoPlainText>This means that <IMG> and <APPLET> tags are
|
|
treated as text, and that</p>
|
(Since a instructor might be trying to view a students version of a resource.) |
<p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
|
|
<p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p>
|
</span></p> |
<p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences
|
|
of white</p>
|
<p><span style='color:black'>* |
<p class=MsoPlainText>space to a single space character. Leading and trailing
|
|
white space is</p>
|
<i>inner_xmlparse</i></span><span |
<p class=MsoPlainText>removed.</p>
|
|
<p class=MsoPlainText>EXAMPLES</p>
|
style='color:black'> - 6 arguments, the target, an array pointer to the current |
<p class=MsoPlainText>This example extracts all links from a document.
|
|
It will print one</p>
|
stack of tags, and array pointer to the current stack of tag arguments, an |
<p class=MsoPlainText>line for each link, containing the URL and the textual
|
|
description</p>
|
array pointer to the current stack of LCParser's, a pointer to the current |
<p class=MsoPlainText>between the <A>...</A> tags:</p>
|
|
<p class=MsoPlainText> use HTML::LCParser;</p>
|
Safe space, a pointer to the hash of current style definitions </span></p> |
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
|
|
<p class=MsoPlainText> while (my $token = $p->get_tag("a"))
|
<p><span style='color:black'>* |
{</p>
|
|
<p class=MsoPlainText> my $url = $token->[1]{href}
|
<i>newparser</i></span><span |
|| "-";</p>
|
|
<p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p>
|
style='color:black'> - 3 args, first is a reference to the parser stack, second |
<p class=MsoPlainText> print "$url\t$text\n";</p>
|
|
<p class=MsoPlainText> }</p>
|
should be a reference to a string scaler containg the text the newparser should |
<p class=MsoPlainText>This example extract the <TITLE> from the document:</p>
|
|
<p class=MsoPlainText> use HTML::LCParser;</p>
|
run over, third should be a scaler of the directory path the file the parser |
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
|
|
<p class=MsoPlainText> if ($p->get_tag("title")) {</p>
|
is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p> |
<p class=MsoPlainText> my $title = $p->get_trimmed_text;</p>
|
|
<p class=MsoPlainText> print "Title: $title\n";</p>
|
<p><span style='color:black'>* |
<p class=MsoPlainText> }</p>
|
|
</div>
|
<i>register</i></span><span |
<br
|
|
clear=ALL style='page-break-before:always;'>
|
style='color:black'> - should be called in a file's BEGIN block. 2 arguments, |
<div class=Section2> </div>
|
|
</body>
|
a scaler string, and a list of strings. This allows a file to register what |
</html>
|
|
|
tags it handles, and what the namespace of those tags are. Example: </span></p> |
|
|
|
<p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p> |
|
|
|
<p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p> |
|
|
|
<p><span style='font-family:"Courier New";color:black'>}</span></p> |
|
|
|
<p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it |
|
|
|
can find handlers for <script> and <display>, if one regsiters |
|
|
|
a tag that was already registered the previous one is remembered and will |
|
|
|
be restored on a deregister. </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<i>deregister</i></span><span |
|
|
|
style='color:black'> - used to remove a previously registered tag implementation. |
|
|
|
It will restore the previous registration if there was one. </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<i>startredirection</i></span><span |
|
|
|
style='color:black'> - used when a tag wants to save a portion of the document |
|
|
|
for its end tag to use, but wants the intervening document to be normally |
|
|
|
processed. (See Apache::scripttag::start_window for an example.) </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<i>endredirection</i></span><span |
|
|
|
style='color:black'> - used to stop preventing xmlparse from hiding output. The |
|
|
|
return value is everthing that xmlparse has processed since the corresponding |
|
|
|
startredirection. (See Apache::scripttag::end_window for an example.) </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<i>Apache::run::evaluate</i></span><span |
|
|
|
style='color:black'> - 3 args, first a string, second a reference to the Safe |
|
|
|
space, 3 a string to be evaluated before the first arg. This subroutine will |
|
|
|
do variable interpolation and simple function interpolations on the first |
|
|
|
argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<i>Apache::run::run</i></span><span |
|
|
|
style='color:black'> - 2 args, first a string, second a reference to the Safe |
|
|
|
space. This handles passing the passed string into the Safe space for evaluation |
|
|
|
and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p> |
|
|
|
<h3><a name="_Toc421867122">Style Files</a></h3> |
|
|
|
<p><span style='color:black'> <img width=432 height=255 |
|
|
|
src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p> |
|
|
|
<p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span |
|
|
|
style='font-size:14.0pt;color:black'> Ð Using a style file</span></p> |
|
|
|
<p><span style='color:black'><b>Style File specific tags</b></span></p> |
|
|
|
<p><span style='color:black'><b><definetag></b></span><span |
|
|
|
style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'> |
|
|
|
name of new tag being defined, if proceeded with a / defining an end tag, |
|
|
|
required; <i>parms</i></span><span style='color:black'> parameters of the |
|
|
|
new tag, the value of these parameters can be accesed by $parametername. </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<b><render></b></span><span |
|
|
|
style='color:black'> - define what the new tag does for a non meta target </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<b><meta></b></span><span |
|
|
|
style='color:black'> - define what the new tag does for a meta target </span></p> |
|
|
|
<p><span style='color:black'>* |
|
|
|
<b><tex> / <web> / <latexsource></b></span><span style='color:black'> |
|
|
|
- define what a new tag does for a specific no meta target, all data inside |
|
|
|
a <render> is render to all targets except when surrounded by a specific |
|
|
|
target tags.</span><span style='font-size:16.0pt;color:black'> </span></p> |
|
|
|
<p class=MsoHeader> <img width=432 height=243 |
|
|
|
src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p> |
|
|
|
<p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span |
|
|
|
style='font-size:14.0pt'> Ð The parser</span></p> |
|
|
|
<h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3> |
|
|
|
<p class=MsoPlainText>SYNOPSIS</p> |
|
|
|
<p class=MsoPlainText> require HTML::LCParser;</p> |
|
|
|
<p class=MsoPlainText> $p = HTML::LCParser->new("index.html") |
|
|
|
|| die "Can't open: $!";</p> |
|
|
|
<p class=MsoPlainText> while (my $token = $p->get_token) {</p> |
|
|
|
<p class=MsoPlainText> #...</p> |
|
|
|
<p class=MsoPlainText> }</p> |
|
|
|
<p class=MsoPlainText>DESCRIPTION</p> |
|
|
|
<p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface |
|
|
|
to the</p> |
|
|
|
<p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser> |
|
|
|
subclass.</p> |
|
|
|
<p class=MsoPlainText>The following methods are available:</p> |
|
|
|
<p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p> |
|
|
|
<p class=MsoPlainText>The object constructor argument is either a file name, |
|
|
|
a file handle</p> |
|
|
|
<p class=MsoPlainText>object, or the complete document to be parsed.</p> |
|
|
|
<p class=MsoPlainText>If the argument is a plain scalar, then it is taken as |
|
|
|
the name of a</p> |
|
|
|
<p class=MsoPlainText>file to be opened and parsed. If the file can't |
|
|
|
be opened for</p> |
|
|
|
<p class=MsoPlainText>reading, then the constructor will return an undefined |
|
|
|
value and $!</p> |
|
|
|
<p class=MsoPlainText>will tell you why it failed.</p> |
|
|
|
<p class=MsoPlainText>If the argument is a reference to a plain scalar, then |
|
|
|
this scalar is</p> |
|
|
|
<p class=MsoPlainText>taken to be the literal document to parse. The value |
|
|
|
of this</p> |
|
|
|
<p class=MsoPlainText>scalar should not be changed before all tokens have been |
|
|
|
extracted.</p> |
|
|
|
<p class=MsoPlainText>Otherwise the argument is taken to be some object that |
|
|
|
the</p> |
|
|
|
<p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs |
|
|
|
more data. Typically</p> |
|
|
|
<p class=MsoPlainText>it will be a filehandle of some kind. The stream |
|
|
|
will be read() until</p> |
|
|
|
<p class=MsoPlainText>EOF, but not closed.</p> |
|
|
|
<p class=MsoPlainText>It also will turn attr_encoded on by default.</p> |
|
|
|
<p class=MsoPlainText>* $p->get_token</p> |
|
|
|
<p class=MsoPlainText>This method will return the next I<token> found |
|
|
|
in the HTML document,</p> |
|
|
|
<p class=MsoPlainText>or C<undef> at the end of the document. The |
|
|
|
token is returned as an</p> |
|
|
|
<p class=MsoPlainText>array reference. The first element of the array |
|
|
|
will be a (mostly)</p> |
|
|
|
<p class=MsoPlainText>single character string denoting the type of this token: |
|
|
|
"S" for start</p> |
|
|
|
<p class=MsoPlainText>tag, "E" for end tag, "T" for text, |
|
|
|
"C" for comment, "D" for</p> |
|
|
|
<p class=MsoPlainText>declaration, and "PI" for process instructions. |
|
|
|
The rest of the array</p> |
|
|
|
<p class=MsoPlainText>is the same as the arguments passed to the corresponding |
|
|
|
HTML::Parser</p> |
|
|
|
<p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>). |
|
|
|
In summary, returned</p> |
|
|
|
<p class=MsoPlainText>tokens look like this:</p> |
|
|
|
<p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text, |
|
|
|
$line]</p> |
|
|
|
<p class=MsoPlainText> ["E", $tag, $text, $line]</p> |
|
|
|
<p class=MsoPlainText> ["T", $text, $is_data, $line]</p> |
|
|
|
<p class=MsoPlainText> ["C", $text, $line]</p> |
|
|
|
<p class=MsoPlainText> ["D", $text, $line]</p> |
|
|
|
<p class=MsoPlainText> ["PI", $token0, $text, $line]</p> |
|
|
|
<p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array |
|
|
|
reference and</p> |
|
|
|
<p class=MsoPlainText>the rest are plain scalars.</p> |
|
|
|
<p class=MsoPlainText>* $p->unget_token($token,...)</p> |
|
|
|
<p class=MsoPlainText>If you find out you have read too many tokens you can |
|
|
|
push them back,</p> |
|
|
|
<p class=MsoPlainText>so that they are returned the next time $p->get_token |
|
|
|
is called.</p> |
|
|
|
<p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p> |
|
|
|
<p class=MsoPlainText>This method returns the next start or end tag (skipping |
|
|
|
any other</p> |
|
|
|
<p class=MsoPlainText>tokens), or C<undef> if there are no more tags in |
|
|
|
the document. If</p> |
|
|
|
<p class=MsoPlainText>one or more arguments are given, then we skip tokens until |
|
|
|
one of the</p> |
|
|
|
<p class=MsoPlainText>specified tag types is found. For example:</p> |
|
|
|
<p class=MsoPlainText> $p->get_tag("font", "/font");</p> |
|
|
|
<p class=MsoPlainText>will find the next start or end tag for a font-element.</p> |
|
|
|
<p class=MsoPlainText>The tag information is returned as an array reference |
|
|
|
in the same form</p> |
|
|
|
<p class=MsoPlainText>as for $p->get_token above, but the type code (first |
|
|
|
element) is</p> |
|
|
|
<p class=MsoPlainText>missing. A start tag will be returned like this:</p> |
|
|
|
<p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p> |
|
|
|
<p class=MsoPlainText>The tagname of end tags are prefixed with "/", |
|
|
|
i.e. end tag is</p> |
|
|
|
<p class=MsoPlainText>returned like this:</p> |
|
|
|
<p class=MsoPlainText> ["/$tag", $text]</p> |
|
|
|
<p class=MsoPlainText>* $p->get_text( [$endtag] )</p> |
|
|
|
<p class=MsoPlainText>This method returns all text found at the current position. |
|
|
|
It will</p> |
|
|
|
<p class=MsoPlainText>return a zero length string if the next token is not text. |
|
|
|
The</p> |
|
|
|
<p class=MsoPlainText>optional $endtag argument specifies that any text occurring |
|
|
|
before the</p> |
|
|
|
<p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p> |
|
|
|
<p class=MsoPlainText>The $p->{textify} attribute is a hash that defines |
|
|
|
how certain tags can</p> |
|
|
|
<p class=MsoPlainText>be treated as text. If the name of a start tag matches |
|
|
|
a key in this</p> |
|
|
|
<p class=MsoPlainText>hash then this tag is converted to text. The hash |
|
|
|
value is used to</p> |
|
|
|
<p class=MsoPlainText>specify which tag attribute to obtain the text from. |
|
|
|
If this tag</p> |
|
|
|
<p class=MsoPlainText>attribute is missing, then the upper case name of the |
|
|
|
tag enclosed in</p> |
|
|
|
<p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The |
|
|
|
hash value can also be a</p> |
|
|
|
<p class=MsoPlainText>subroutine reference. In this case the routine is |
|
|
|
called with the</p> |
|
|
|
<p class=MsoPlainText>start tag token content as its argument and the return |
|
|
|
value is treated</p> |
|
|
|
<p class=MsoPlainText>as the text.</p> |
|
|
|
<p class=MsoPlainText>The default $p->{textify} value is:</p> |
|
|
|
<p class=MsoPlainText> {img => "alt", applet => "alt"}</p> |
|
|
|
<p class=MsoPlainText>This means that <IMG> and <APPLET> tags are |
|
|
|
treated as text, and that</p> |
|
|
|
<p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p> |
|
|
|
<p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p> |
|
|
|
<p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences |
|
|
|
of white</p> |
|
|
|
<p class=MsoPlainText>space to a single space character. Leading and trailing |
|
|
|
white space is</p> |
|
|
|
<p class=MsoPlainText>removed.</p> |
|
|
|
<p class=MsoPlainText>EXAMPLES</p> |
|
|
|
<p class=MsoPlainText>This example extracts all links from a document. |
|
|
|
It will print one</p> |
|
|
|
<p class=MsoPlainText>line for each link, containing the URL and the textual |
|
|
|
description</p> |
|
|
|
<p class=MsoPlainText>between the <A>...</A> tags:</p> |
|
|
|
<p class=MsoPlainText> use HTML::LCParser;</p> |
|
|
|
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p> |
|
|
|
<p class=MsoPlainText> while (my $token = $p->get_tag("a")) |
|
|
|
{</p> |
|
|
|
<p class=MsoPlainText> my $url = $token->[1]{href} |
|
|
|
|| "-";</p> |
|
|
|
<p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p> |
|
|
|
<p class=MsoPlainText> print "$url\t$text\n";</p> |
|
|
|
<p class=MsoPlainText> }</p> |
|
|
|
<p class=MsoPlainText>This example extract the <TITLE> from the document:</p> |
|
|
|
<p class=MsoPlainText> use HTML::LCParser;</p> |
|
|
|
<p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p> |
|
|
|
<p class=MsoPlainText> if ($p->get_tag("title")) {</p> |
|
|
|
<p class=MsoPlainText> my $title = $p->get_trimmed_text;</p> |
|
|
|
<p class=MsoPlainText> print "Title: $title\n";</p> |
|
|
|
<p class=MsoPlainText> }</p> |
|
|
|
</div> |
|
|
|
<br |
|
|
|
clear=ALL style='page-break-before:always;'> |
|
|
|
<div class=Section2> </div> |
|
|
|
</body> |
|
|
|
</html> |
|
|