File:  [LON-CAPA] / doc / gutshtml / SessionFou1.html
Revision 1.1: download - view: text, annotated - select for diffs
Fri Jun 28 20:30:29 2002 UTC (22 years, 5 months ago) by www
Branches: MAIN
CVS tags: version_0_99_3, version_0_99_2, version_0_99_1, version_0_99_0, version_0_6_2, version_0_6, version_0_5_1, version_0_5, version_0_4, stable_2002_july, conference_2003, STABLE, HEAD
HTML version of GUTS manual. Individual files will still need cleanup.

<html>
<head>
<meta name=Title
content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
<meta http-equiv=Content-Type content="text/html; charset=macintosh">
<link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
<title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 
Files) (Guy)</title>
<style><!--
.MsoHeader
	{tab-stops:center 3.0in right 6.0in;
	font-size:10.0pt;
	font-family:"Times New Roman";}
.MsoPlainText
	{font-size:10.0pt;
	font-family:"Courier New";}
.Section1
	{page:Section1;}
.Section2
	{page:Section2;}
-->
</style>
</head>
<body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
<div class=Section1> 
  <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style 
    Files) (Guy)</h2>
  <h3><a name="_Toc421867121">XML Files</a></h3>
  <p><span style='color:black'>All HTML / XML files are run through the lonxml 
    handler before being served to a user. This allows us to rewrite many portion 
    of a document and to support serverside tags. There are 2 ways to add new 
    tags to the xml parsing engine, either through LON-CAPA style files or by 
    writing Perl tag handlers for the desired tags. </span></p>
  <p><span style='color:black'><b>Global Variables</b></span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::debug</i></span><span
style='color:black'> - debugging control </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>@Apache::lonxml::pwd</i></span><span
style='color:black'> - path to the directory containing the file currently being 
    processed </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>@Apache::lonxml::outputstack</i></span><span
style='color:black'> </span></p>
  <p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
style='color:black'> - these two are used for capturing a subset of the output 
    for later processing, don't touch them directly use &amp;startredirection 
    and &amp;endredirection </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::import</i></span><span
style='color:black'> - controls whether the &lt;import&gt; tag actually does anything 
    </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>@Apache::lonxml::extlinks</i></span><span
style='color:black'> - a list of URLs that the user is allowed to look at because 
    of the current resource (images, and links) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::metamode</i></span><span
style='color:black'> - some output is turned off, the meta target wants a specific 
    subset, use &lt;output&gt; to guarentee that the catianed data will be in 
    the parsing output </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::evaluate</i></span><span
style='color:black'> - controls whether run::evaluate actually derefences variable 
    references </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>%Apache::lonxml::insertlist</i></span><span
style='color:black'> - data structure for edit mode, determines what tags can 
    go into what other tags </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>@Apache::lonxml::namespace</i></span><span
style='color:black'> - stores the list of tag namespaces used in the insertlist.tab 
    file that are currently active, used only in edit mode. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::registered</i></span><span
style='color:black'> - set to 1 once the remote has been updated to know what 
    resource we are looking at. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::request</i></span><span
style='color:black'> - current Apache request object, or undef </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::curdepth</i></span><span
style='color:black'> - current depth of the overall parse depth. Will be a string 
    like: 2_3_1 (first tag in the third second level tag in the second toplevel 
    tag). It gets set by callsub, and can be used in Perl tag implementations. 
    It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
style='color:black'> </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>$Apache::lonxml::prevent_entity_encode</i></span><span
style='color:black'> - By default the xmlparser will try to rencode any 8-bit 
    characters into HTMLEntity Codes, If this is set to a true value it will be 
    prevented. </span></p>
  <p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
style='color:black'>, <i>$Apache::lonxml::import</i></span><span
style='color:black'>, should never be set to a value directly, but rather incremented 
    when you want the effect on, and decremented when you want the effect off. 
    </span></p>
  <p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
  <p><span style='color:black'>If not specified these functions are in Apache::lonxml 
    </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>xmlparse</i></span><span
style='color:black'> - see the XMLPARSE figure - also not callable from inside 
    a tag, if one needs to restart parsing, either create add a new LCParser to 
    the parser stack parser using the newparser function, or call inner_xmlparser, 
    see the xmlparse function in scripttag.pm </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>recurse</i></span><span
style='color:black'> - acts just like <i>xmlparse</i></span><span
style='color:black'>, except it doesn't do the style definition check it always 
    calls <i>callsub</i></span><span style='color:black'> </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>callsub</i></span><span
style='color:black'> - callsub looks if a perl subroutine is defined for the current 
    tag and calls. Otherwise it just returns the tag as it was read in. It also 
    will throw on a default editing interface unless the tag has a defined subroutine 
    that either returns something or requests that call sub not add the editing 
    interface. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>afterburn</i></span><span
style='color:black'> - called on the output of xmlparse, it can add highlights, 
    anchors, and links to regular expersion matches to the output. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>register_insert</i></span><span
style='color:black'> - builds the %Apache::lonxml::insertlist structure of what 
    tags can have what other tags inside. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>whichuser</i></span><span
style='color:black'> - returns a list of $symb, $courseid, $domain, $name that 
    is correct for calls to lonnet functions for this setup. Uses form.grade_ 
    parameters, if the user is allowed to mgr in the course </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>setup_globals</i></span><span
style='color:black'> - initializes all lonxml globals when xmlparse is called. 
    If you intend to create a new target you will likely need to tweak how the 
    globals are setup upon start up. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>init_safespace</i></span><span
style='color:black'> - creates Holes to external functions, creates some global 
    variables, and set the permitted operators of the global Safespace intepreter. 
    </span></p>
  <p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
  <p><span style='color:black'>If not specified these functions are in Apache::lonxml 
    </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>debug</i></span><span
style='color:black'> - a function to call to printout debugging messages. Will 
    only print when Apache::lonxml::debug is set to 1 </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>warning</i></span><span
style='color:black'> - a function to use for warning messages. The message will 
    appear at the top of a resource when it is viewed in construction space only. 
    </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>error</i></span><span
style='color:black'> - a function to use for error messages. The message will 
    appear at the top of a resource when it is viewed in construction space, and 
    will message the resource author and course instructor, while informing the 
    student that an error has occured otherwise. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>get_all_text</i></span><span
style='color:black'> - 2 args, tag to look for (need to use /tag to look for an 
    end tag) and a HTML::TokeParser reference, it will repedelyt get text from 
    the TokeParser until the requested tag is found. It will return all of the 
    document it pulled form the TokeParser. (See Apache::scripttag::start_script 
    for an example of usage.) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>get_param</i></span><span
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, 
    second is a reference to the parser arguments stack, third is a reference 
    to the Safe space, and fourth is an optional &quot;context&quot; value. This 
    subroutine allows a tag to get a tag argument, after being interpolated inside 
    the Safe space. This should be used if the tag might use a safe space variable 
    reference for the tag argument. (See Apache::scripttag::start_script for an 
    example.) This version only handles scalar variables. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>get_param_var</i></span><span
style='color:black'> - 4 arguments, first is a scaler sting of the argument needed, 
    second is a reference to the parser arguments stack, third is a reference 
    to the Safe space, and fourth is an optional &quot;context&quot; value. This 
    subroutine allows a tag to get a tag argument, after being interpolated inside 
    the Safe space. This should be used if the tag might use a safe space variable 
    reference for the tag argument. (See Apache::scripttag::start_script for an 
    example.) This version can handle list or hash variables properly. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>description</i></span><span
style='color:black'> - 1 argument, the token object. This will return the textual 
    decription of the current tag from the insertlist.tab file. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>whichuser</i></span><span
style='color:black'> - 0 arguments. This will take a look at the current environment 
    setting and return the current $symb, $courseid, $udom, $uname. You should 
    always use this function if you want to determine who the current user is. 
    (Since a instructor might be trying to view a students version of a resource.) 
    </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>inner_xmlparse</i></span><span
style='color:black'> - 6 arguments, the target, an array pointer to the current 
    stack of tags, and array pointer to the current stack of tag arguments, an 
    array pointer to the current stack of LCParser's, a pointer to the current 
    Safe space, a pointer to the hash of current style definitions </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>newparser</i></span><span
style='color:black'> - 3 args, first is a reference to the parser stack, second 
    should be a reference to a string scaler containg the text the newparser should 
    run over, third should be a scaler of the directory path the file the parser 
    is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>register</i></span><span
style='color:black'> - should be called in a file's BEGIN block. 2 arguments, 
    a scaler string, and a list of strings. This allows a file to register what 
    tags it handles, and what the namespace of those tags are. Example: </span></p>
  <p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
  <p><span style='font-family:"Courier New";color:black'>&nbsp; &amp;Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
  <p><span style='font-family:"Courier New";color:black'>}</span></p>
  <p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it 
    can find handlers for &lt;script&gt; and &lt;display&gt;, if one regsiters 
    a tag that was already registered the previous one is remembered and will 
    be restored on a deregister. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>deregister</i></span><span
style='color:black'> - used to remove a previously registered tag implementation. 
    It will restore the previous registration if there was one. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>startredirection</i></span><span
style='color:black'> - used when a tag wants to save a portion of the document 
    for its end tag to use, but wants the intervening document to be normally 
    processed. (See Apache::scripttag::start_window for an example.) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>endredirection</i></span><span
style='color:black'> - used to stop preventing xmlparse from hiding output. The 
    return value is everthing that xmlparse has processed since the corresponding 
    startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>Apache::run::evaluate</i></span><span
style='color:black'> - 3 args, first a string, second a reference to the Safe 
    space, 3 a string to be evaluated before the first arg. This subroutine will 
    do variable interpolation and simple function interpolations on the first 
    argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <i>Apache::run::run</i></span><span
style='color:black'> - 2 args, first a string, second a reference to the Safe 
    space. This handles passing the passed string into the Safe space for evaluation 
    and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
  <h3><a name="_Toc421867122">Style Files</a></h3>
  <p><span style='color:black'> <img width=432 height=255
src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
  <p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
  <p><span style='color:black'><b>Style File specific tags</b></span></p>
  <p><span style='color:black'><b>&lt;definetag&gt;</b></span><span
style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'> 
    name of new tag being defined, if proceeded with a / defining an end tag, 
    required; <i>parms</i></span><span style='color:black'> parameters of the 
    new tag, the value of these parameters can be accesed by $parametername. </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <b>&lt;render&gt;</b></span><span
style='color:black'> - define what the new tag does for a non meta target </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <b>&lt;meta&gt;</b></span><span
style='color:black'> - define what the new tag does for a meta target </span></p>
  <p><span style='color:black'>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
    <b>&lt;tex&gt; / &lt;web&gt; / &lt;latexsource&gt;</b></span><span style='color:black'> 
    - define what a new tag does for a specific no meta target, all data inside 
    a &lt;render&gt; is render to all targets except when surrounded by a specific 
    target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
  <p class=MsoHeader> <img width=432 height=243
src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
  <p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
style='font-size:14.0pt'> Ð The parser</span></p>
  <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
  <p class=MsoPlainText>SYNOPSIS</p>
  <p class=MsoPlainText>&nbsp;require HTML::LCParser;</p>
  <p class=MsoPlainText>&nbsp;$p = HTML::LCParser-&gt;new(&quot;index.html&quot;) 
    || die &quot;Can't open: $!&quot;;</p>
  <p class=MsoPlainText>&nbsp;while (my $token = $p-&gt;get_token) {</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp; #...</p>
  <p class=MsoPlainText>&nbsp;}</p>
  <p class=MsoPlainText>DESCRIPTION</p>
  <p class=MsoPlainText>The C&lt;HTML::LCParser&gt; is an alternative interface 
    to the</p>
  <p class=MsoPlainText>C&lt;HTML::Parser&gt; class.&nbsp; It is an C&lt;HTML::PullParser&gt; 
    subclass.</p>
  <p class=MsoPlainText>The following methods are available:</p>
  <p class=MsoPlainText>* $p = HTML::LCParser-&gt;new( $file_or_doc );</p>
  <p class=MsoPlainText>The object constructor argument is either a file name, 
    a file handle</p>
  <p class=MsoPlainText>object, or the complete document to be parsed.</p>
  <p class=MsoPlainText>If the argument is a plain scalar, then it is taken as 
    the name of a</p>
  <p class=MsoPlainText>file to be opened and parsed.&nbsp; If the file can't 
    be opened for</p>
  <p class=MsoPlainText>reading, then the constructor will return an undefined 
    value and $!</p>
  <p class=MsoPlainText>will tell you why it failed.</p>
  <p class=MsoPlainText>If the argument is a reference to a plain scalar, then 
    this scalar is</p>
  <p class=MsoPlainText>taken to be the literal document to parse.&nbsp; The value 
    of this</p>
  <p class=MsoPlainText>scalar should not be changed before all tokens have been 
    extracted.</p>
  <p class=MsoPlainText>Otherwise the argument is taken to be some object that 
    the</p>
  <p class=MsoPlainText>C&lt;HTML::LCParser&gt; can read() from when it needs 
    more data.&nbsp; Typically</p>
  <p class=MsoPlainText>it will be a filehandle of some kind.&nbsp; The stream 
    will be read() until</p>
  <p class=MsoPlainText>EOF, but not closed.</p>
  <p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
  <p class=MsoPlainText>* $p-&gt;get_token</p>
  <p class=MsoPlainText>This method will return the next I&lt;token&gt; found 
    in the HTML document,</p>
  <p class=MsoPlainText>or C&lt;undef&gt; at the end of the document.&nbsp; The 
    token is returned as an</p>
  <p class=MsoPlainText>array reference.&nbsp; The first element of the array 
    will be a (mostly)</p>
  <p class=MsoPlainText>single character string denoting the type of this token: 
    &quot;S&quot; for start</p>
  <p class=MsoPlainText>tag, &quot;E&quot; for end tag, &quot;T&quot; for text, 
    &quot;C&quot; for comment, &quot;D&quot; for</p>
  <p class=MsoPlainText>declaration, and &quot;PI&quot; for process instructions.&nbsp; 
    The rest of the array</p>
  <p class=MsoPlainText>is the same as the arguments passed to the corresponding 
    HTML::Parser</p>
  <p class=MsoPlainText>v2 compatible callbacks (see L&lt;HTML::Parser&gt;).&nbsp; 
    In summary, returned</p>
  <p class=MsoPlainText>tokens look like this:</p>
  <p class=MsoPlainText>&nbsp; [&quot;S&quot;,&nbsp; $tag, $attr, $attrseq, $text, 
    $line]</p>
  <p class=MsoPlainText>&nbsp; [&quot;E&quot;,&nbsp; $tag, $text, $line]</p>
  <p class=MsoPlainText>&nbsp; [&quot;T&quot;,&nbsp; $text, $is_data, $line]</p>
  <p class=MsoPlainText>&nbsp; [&quot;C&quot;,&nbsp; $text, $line]</p>
  <p class=MsoPlainText>&nbsp; [&quot;D&quot;,&nbsp; $text, $line]</p>
  <p class=MsoPlainText>&nbsp; [&quot;PI&quot;, $token0, $text, $line]</p>
  <p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array 
    reference and</p>
  <p class=MsoPlainText>the rest are plain scalars.</p>
  <p class=MsoPlainText>* $p-&gt;unget_token($token,...)</p>
  <p class=MsoPlainText>If you find out you have read too many tokens you can 
    push them back,</p>
  <p class=MsoPlainText>so that they are returned the next time $p-&gt;get_token 
    is called.</p>
  <p class=MsoPlainText>* $p-&gt;get_tag( [$tag, ...] )</p>
  <p class=MsoPlainText>This method returns the next start or end tag (skipping 
    any other</p>
  <p class=MsoPlainText>tokens), or C&lt;undef&gt; if there are no more tags in 
    the document.&nbsp; If</p>
  <p class=MsoPlainText>one or more arguments are given, then we skip tokens until 
    one of the</p>
  <p class=MsoPlainText>specified tag types is found.&nbsp; For example:</p>
  <p class=MsoPlainText>&nbsp;&nbsp; $p-&gt;get_tag(&quot;font&quot;, &quot;/font&quot;);</p>
  <p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
  <p class=MsoPlainText>The tag information is returned as an array reference 
    in the same form</p>
  <p class=MsoPlainText>as for $p-&gt;get_token above, but the type code (first 
    element) is</p>
  <p class=MsoPlainText>missing. A start tag will be returned like this:</p>
  <p class=MsoPlainText>&nbsp; [$tag, $attr, $attrseq, $text]</p>
  <p class=MsoPlainText>The tagname of end tags are prefixed with &quot;/&quot;, 
    i.e. end tag is</p>
  <p class=MsoPlainText>returned like this:</p>
  <p class=MsoPlainText>&nbsp; [&quot;/$tag&quot;, $text]</p>
  <p class=MsoPlainText>* $p-&gt;get_text( [$endtag] )</p>
  <p class=MsoPlainText>This method returns all text found at the current position. 
    It will</p>
  <p class=MsoPlainText>return a zero length string if the next token is not text.&nbsp; 
    The</p>
  <p class=MsoPlainText>optional $endtag argument specifies that any text occurring 
    before the</p>
  <p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
  <p class=MsoPlainText>The $p-&gt;{textify} attribute is a hash that defines 
    how certain tags can</p>
  <p class=MsoPlainText>be treated as text.&nbsp; If the name of a start tag matches 
    a key in this</p>
  <p class=MsoPlainText>hash then this tag is converted to text.&nbsp; The hash 
    value is used to</p>
  <p class=MsoPlainText>specify which tag attribute to obtain the text from.&nbsp; 
    If this tag</p>
  <p class=MsoPlainText>attribute is missing, then the upper case name of the 
    tag enclosed in</p>
  <p class=MsoPlainText>brackets is returned, e.g. &quot;[IMG]&quot;.&nbsp; The 
    hash value can also be a</p>
  <p class=MsoPlainText>subroutine reference.&nbsp; In this case the routine is 
    called with the</p>
  <p class=MsoPlainText>start tag token content as its argument and the return 
    value is treated</p>
  <p class=MsoPlainText>as the text.</p>
  <p class=MsoPlainText>The default $p-&gt;{textify} value is:</p>
  <p class=MsoPlainText>&nbsp; {img =&gt; &quot;alt&quot;, applet =&gt; &quot;alt&quot;}</p>
  <p class=MsoPlainText>This means that &lt;IMG&gt; and &lt;APPLET&gt; tags are 
    treated as text, and that</p>
  <p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
  <p class=MsoPlainText>* $p-&gt;get_trimmed_text( [$endtag] )</p>
  <p class=MsoPlainText>Same as $p-&gt;get_text above, but will collapse any sequences 
    of white</p>
  <p class=MsoPlainText>space to a single space character.&nbsp; Leading and trailing 
    white space is</p>
  <p class=MsoPlainText>removed.</p>
  <p class=MsoPlainText>EXAMPLES</p>
  <p class=MsoPlainText>This example extracts all links from a document.&nbsp; 
    It will print one</p>
  <p class=MsoPlainText>line for each link, containing the URL and the textual 
    description</p>
  <p class=MsoPlainText>between the &lt;A&gt;...&lt;/A&gt; tags:</p>
  <p class=MsoPlainText>&nbsp; use HTML::LCParser;</p>
  <p class=MsoPlainText>&nbsp; $p = HTML::LCParser-&gt;new(shift||&quot;index.html&quot;);</p>
  <p class=MsoPlainText>&nbsp; while (my $token = $p-&gt;get_tag(&quot;a&quot;)) 
    {</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $url = $token-&gt;[1]{href} 
    || &quot;-&quot;;</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $text = $p-&gt;get_trimmed_text(&quot;/a&quot;);</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;$url\t$text\n&quot;;</p>
  <p class=MsoPlainText>&nbsp; }</p>
  <p class=MsoPlainText>This example extract the &lt;TITLE&gt; from the document:</p>
  <p class=MsoPlainText>&nbsp; use HTML::LCParser;</p>
  <p class=MsoPlainText>&nbsp; $p = HTML::LCParser-&gt;new(shift||&quot;index.html&quot;);</p>
  <p class=MsoPlainText>&nbsp; if ($p-&gt;get_tag(&quot;title&quot;)) {</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; my $title = $p-&gt;get_trimmed_text;</p>
  <p class=MsoPlainText>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;Title: $title\n&quot;;</p>
  <p class=MsoPlainText>&nbsp; }</p>
</div>
<br
clear=ALL style='page-break-before:always;'>
<div class=Section2> </div>
</body>
</html>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>