Annotation of doc/gutshtml/SessionFou1.html, revision 1.2
1.2 ! bowersj2 1: <html>
! 2:
! 3: <head>
! 4:
! 5: <meta name=Title
! 6:
! 7: content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
! 8:
! 9: <meta http-equiv=Content-Type content="text/html; charset=macintosh">
! 10:
! 11: <link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
! 12:
! 13: <title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
! 14:
! 15: Files) (Guy)</title>
! 16:
! 17: <style><!--
! 18:
! 19: .MsoHeader
! 20:
! 21: {tab-stops:center 3.0in right 6.0in;
! 22:
! 23: font-size:10.0pt;
! 24:
! 25: font-family:"Times New Roman";}
! 26:
! 27: .MsoPlainText
! 28:
! 29: {font-size:10.0pt;
! 30:
! 31: font-family:"Courier New";}
! 32:
! 33: .Section1
! 34:
! 35: {page:Section1;}
! 36:
! 37: .Section2
! 38:
! 39: {page:Section2;}
! 40:
! 41: -->
! 42:
! 43: </style>
! 44:
! 45: </head>
! 46:
! 47: <body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
! 48:
! 49: <div class=Section1>
! 50:
! 51: <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
! 52:
! 53: Files) (Guy)</h2>
! 54:
! 55: <h3><a name="_Toc421867121">XML Files</a></h3>
! 56:
! 57: <p><span style='color:black'>All HTML / XML files are run through the lonxml
! 58:
! 59: handler before being served to a user. This allows us to rewrite many portion
! 60:
! 61: of a document and to support serverside tags. There are 2 ways to add new
! 62:
! 63: tags to the xml parsing engine, either through LON-CAPA style files or by
! 64:
! 65: writing Perl tag handlers for the desired tags. </span></p>
! 66:
! 67: <p><span style='color:black'><b>Global Variables</b></span></p>
! 68:
! 69: <p><span style='color:black'>*
! 70:
! 71: <i>$Apache::lonxml::debug</i></span><span
! 72:
! 73: style='color:black'> - debugging control </span></p>
! 74:
! 75: <p><span style='color:black'>*
! 76:
! 77: <i>@Apache::lonxml::pwd</i></span><span
! 78:
! 79: style='color:black'> - path to the directory containing the file currently being
! 80:
! 81: processed </span></p>
! 82:
! 83: <p><span style='color:black'>*
! 84:
! 85: <i>@Apache::lonxml::outputstack</i></span><span
! 86:
! 87: style='color:black'> </span></p>
! 88:
! 89: <p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
! 90:
! 91: style='color:black'> - these two are used for capturing a subset of the output
! 92:
! 93: for later processing, don't touch them directly use &startredirection
! 94:
! 95: and &endredirection </span></p>
! 96:
! 97: <p><span style='color:black'>*
! 98:
! 99: <i>$Apache::lonxml::import</i></span><span
! 100:
! 101: style='color:black'> - controls whether the <import> tag actually does anything
! 102:
! 103: </span></p>
! 104:
! 105: <p><span style='color:black'>*
! 106:
! 107: <i>@Apache::lonxml::extlinks</i></span><span
! 108:
! 109: style='color:black'> - a list of URLs that the user is allowed to look at because
! 110:
! 111: of the current resource (images, and links) </span></p>
! 112:
! 113: <p><span style='color:black'>*
! 114:
! 115: <i>$Apache::lonxml::metamode</i></span><span
! 116:
! 117: style='color:black'> - some output is turned off, the meta target wants a specific
! 118:
! 119: subset, use <output> to guarentee that the catianed data will be in
! 120:
! 121: the parsing output </span></p>
! 122:
! 123: <p><span style='color:black'>*
! 124:
! 125: <i>$Apache::lonxml::evaluate</i></span><span
! 126:
! 127: style='color:black'> - controls whether run::evaluate actually derefences variable
! 128:
! 129: references </span></p>
! 130:
! 131: <p><span style='color:black'>*
! 132:
! 133: <i>%Apache::lonxml::insertlist</i></span><span
! 134:
! 135: style='color:black'> - data structure for edit mode, determines what tags can
! 136:
! 137: go into what other tags </span></p>
! 138:
! 139: <p><span style='color:black'>*
! 140:
! 141: <i>@Apache::lonxml::namespace</i></span><span
! 142:
! 143: style='color:black'> - stores the list of tag namespaces used in the insertlist.tab
! 144:
! 145: file that are currently active, used only in edit mode. </span></p>
! 146:
! 147: <p><span style='color:black'>*
! 148:
! 149: <i>$Apache::lonxml::registered</i></span><span
! 150:
! 151: style='color:black'> - set to 1 once the remote has been updated to know what
! 152:
! 153: resource we are looking at. </span></p>
! 154:
! 155: <p><span style='color:black'>*
! 156:
! 157: <i>$Apache::lonxml::request</i></span><span
! 158:
! 159: style='color:black'> - current Apache request object, or undef </span></p>
! 160:
! 161: <p><span style='color:black'>*
! 162:
! 163: <i>$Apache::lonxml::curdepth</i></span><span
! 164:
! 165: style='color:black'> - current depth of the overall parse depth. Will be a string
! 166:
! 167: like: 2_3_1 (first tag in the third second level tag in the second toplevel
! 168:
! 169: tag). It gets set by callsub, and can be used in Perl tag implementations.
! 170:
! 171: It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
! 172:
! 173: style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
! 174:
! 175: style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
! 176:
! 177: style='color:black'> </span></p>
! 178:
! 179: <p><span style='color:black'>*
! 180:
! 181: <i>$Apache::lonxml::prevent_entity_encode</i></span><span
! 182:
! 183: style='color:black'> - By default the xmlparser will try to rencode any 8-bit
! 184:
! 185: characters into HTMLEntity Codes, If this is set to a true value it will be
! 186:
! 187: prevented. </span></p>
! 188:
! 189: <p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
! 190:
! 191: style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
! 192:
! 193: style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
! 194:
! 195: style='color:black'>, <i>$Apache::lonxml::import</i></span><span
! 196:
! 197: style='color:black'>, should never be set to a value directly, but rather incremented
! 198:
! 199: when you want the effect on, and decremented when you want the effect off.
! 200:
! 201: </span></p>
! 202:
! 203: <p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
! 204:
! 205: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
! 206:
! 207: </span></p>
! 208:
! 209: <p><span style='color:black'>*
! 210:
! 211: <i>xmlparse</i></span><span
! 212:
! 213: style='color:black'> - see the XMLPARSE figure - also not callable from inside
! 214:
! 215: a tag, if one needs to restart parsing, either create add a new LCParser to
! 216:
! 217: the parser stack parser using the newparser function, or call inner_xmlparser,
! 218:
! 219: see the xmlparse function in scripttag.pm </span></p>
! 220:
! 221: <p><span style='color:black'>*
! 222:
! 223: <i>recurse</i></span><span
! 224:
! 225: style='color:black'> - acts just like <i>xmlparse</i></span><span
! 226:
! 227: style='color:black'>, except it doesn't do the style definition check it always
! 228:
! 229: calls <i>callsub</i></span><span style='color:black'> </span></p>
! 230:
! 231: <p><span style='color:black'>*
! 232:
! 233: <i>callsub</i></span><span
! 234:
! 235: style='color:black'> - callsub looks if a perl subroutine is defined for the current
! 236:
! 237: tag and calls. Otherwise it just returns the tag as it was read in. It also
! 238:
! 239: will throw on a default editing interface unless the tag has a defined subroutine
! 240:
! 241: that either returns something or requests that call sub not add the editing
! 242:
! 243: interface. </span></p>
! 244:
! 245: <p><span style='color:black'>*
! 246:
! 247: <i>afterburn</i></span><span
! 248:
! 249: style='color:black'> - called on the output of xmlparse, it can add highlights,
! 250:
! 251: anchors, and links to regular expersion matches to the output. </span></p>
! 252:
! 253: <p><span style='color:black'>*
! 254:
! 255: <i>register_insert</i></span><span
! 256:
! 257: style='color:black'> - builds the %Apache::lonxml::insertlist structure of what
! 258:
! 259: tags can have what other tags inside. </span></p>
! 260:
! 261: <p><span style='color:black'>*
! 262:
! 263: <i>whichuser</i></span><span
! 264:
! 265: style='color:black'> - returns a list of $symb, $courseid, $domain, $name that
! 266:
! 267: is correct for calls to lonnet functions for this setup. Uses form.grade_
! 268:
! 269: parameters, if the user is allowed to mgr in the course </span></p>
! 270:
! 271: <p><span style='color:black'>*
! 272:
! 273: <i>setup_globals</i></span><span
! 274:
! 275: style='color:black'> - initializes all lonxml globals when xmlparse is called.
! 276:
! 277: If you intend to create a new target you will likely need to tweak how the
! 278:
! 279: globals are setup upon start up. </span></p>
! 280:
! 281: <p><span style='color:black'>*
! 282:
! 283: <i>init_safespace</i></span><span
! 284:
! 285: style='color:black'> - creates Holes to external functions, creates some global
! 286:
! 287: variables, and set the permitted operators of the global Safespace intepreter.
! 288:
! 289: </span></p>
! 290:
! 291: <p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
! 292:
! 293: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
! 294:
! 295: </span></p>
! 296:
! 297: <p><span style='color:black'>*
! 298:
! 299: <i>debug</i></span><span
! 300:
! 301: style='color:black'> - a function to call to printout debugging messages. Will
! 302:
! 303: only print when Apache::lonxml::debug is set to 1 </span></p>
! 304:
! 305: <p><span style='color:black'>*
! 306:
! 307: <i>warning</i></span><span
! 308:
! 309: style='color:black'> - a function to use for warning messages. The message will
! 310:
! 311: appear at the top of a resource when it is viewed in construction space only.
! 312:
! 313: </span></p>
! 314:
! 315: <p><span style='color:black'>*
! 316:
! 317: <i>error</i></span><span
! 318:
! 319: style='color:black'> - a function to use for error messages. The message will
! 320:
! 321: appear at the top of a resource when it is viewed in construction space, and
! 322:
! 323: will message the resource author and course instructor, while informing the
! 324:
! 325: student that an error has occured otherwise. </span></p>
! 326:
! 327: <p><span style='color:black'>*
! 328:
! 329: <i>get_all_text</i></span><span
! 330:
! 331: style='color:black'> - 2 args, tag to look for (need to use /tag to look for an
! 332:
! 333: end tag) and a HTML::TokeParser reference, it will repedelyt get text from
! 334:
! 335: the TokeParser until the requested tag is found. It will return all of the
! 336:
! 337: document it pulled form the TokeParser. (See Apache::scripttag::start_script
! 338:
! 339: for an example of usage.) </span></p>
! 340:
! 341: <p><span style='color:black'>*
! 342:
! 343: <i>get_param</i></span><span
! 344:
! 345: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
! 346:
! 347: second is a reference to the parser arguments stack, third is a reference
! 348:
! 349: to the Safe space, and fourth is an optional "context" value. This
! 350:
! 351: subroutine allows a tag to get a tag argument, after being interpolated inside
! 352:
! 353: the Safe space. This should be used if the tag might use a safe space variable
! 354:
! 355: reference for the tag argument. (See Apache::scripttag::start_script for an
! 356:
! 357: example.) This version only handles scalar variables. </span></p>
! 358:
! 359: <p><span style='color:black'>*
! 360:
! 361: <i>get_param_var</i></span><span
! 362:
! 363: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
! 364:
! 365: second is a reference to the parser arguments stack, third is a reference
! 366:
! 367: to the Safe space, and fourth is an optional "context" value. This
! 368:
! 369: subroutine allows a tag to get a tag argument, after being interpolated inside
! 370:
! 371: the Safe space. This should be used if the tag might use a safe space variable
! 372:
! 373: reference for the tag argument. (See Apache::scripttag::start_script for an
! 374:
! 375: example.) This version can handle list or hash variables properly. </span></p>
! 376:
! 377: <p><span style='color:black'>*
! 378:
! 379: <i>description</i></span><span
! 380:
! 381: style='color:black'> - 1 argument, the token object. This will return the textual
! 382:
! 383: decription of the current tag from the insertlist.tab file. </span></p>
! 384:
! 385: <p><span style='color:black'>*
! 386:
! 387: <i>whichuser</i></span><span
! 388:
! 389: style='color:black'> - 0 arguments. This will take a look at the current environment
! 390:
! 391: setting and return the current $symb, $courseid, $udom, $uname. You should
! 392:
! 393: always use this function if you want to determine who the current user is.
! 394:
! 395: (Since a instructor might be trying to view a students version of a resource.)
! 396:
! 397: </span></p>
! 398:
! 399: <p><span style='color:black'>*
! 400:
! 401: <i>inner_xmlparse</i></span><span
! 402:
! 403: style='color:black'> - 6 arguments, the target, an array pointer to the current
! 404:
! 405: stack of tags, and array pointer to the current stack of tag arguments, an
! 406:
! 407: array pointer to the current stack of LCParser's, a pointer to the current
! 408:
! 409: Safe space, a pointer to the hash of current style definitions </span></p>
! 410:
! 411: <p><span style='color:black'>*
! 412:
! 413: <i>newparser</i></span><span
! 414:
! 415: style='color:black'> - 3 args, first is a reference to the parser stack, second
! 416:
! 417: should be a reference to a string scaler containg the text the newparser should
! 418:
! 419: run over, third should be a scaler of the directory path the file the parser
! 420:
! 421: is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
! 422:
! 423: <p><span style='color:black'>*
! 424:
! 425: <i>register</i></span><span
! 426:
! 427: style='color:black'> - should be called in a file's BEGIN block. 2 arguments,
! 428:
! 429: a scaler string, and a list of strings. This allows a file to register what
! 430:
! 431: tags it handles, and what the namespace of those tags are. Example: </span></p>
! 432:
! 433: <p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
! 434:
! 435: <p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
! 436:
! 437: <p><span style='font-family:"Courier New";color:black'>}</span></p>
! 438:
! 439: <p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it
! 440:
! 441: can find handlers for <script> and <display>, if one regsiters
! 442:
! 443: a tag that was already registered the previous one is remembered and will
! 444:
! 445: be restored on a deregister. </span></p>
! 446:
! 447: <p><span style='color:black'>*
! 448:
! 449: <i>deregister</i></span><span
! 450:
! 451: style='color:black'> - used to remove a previously registered tag implementation.
! 452:
! 453: It will restore the previous registration if there was one. </span></p>
! 454:
! 455: <p><span style='color:black'>*
! 456:
! 457: <i>startredirection</i></span><span
! 458:
! 459: style='color:black'> - used when a tag wants to save a portion of the document
! 460:
! 461: for its end tag to use, but wants the intervening document to be normally
! 462:
! 463: processed. (See Apache::scripttag::start_window for an example.) </span></p>
! 464:
! 465: <p><span style='color:black'>*
! 466:
! 467: <i>endredirection</i></span><span
! 468:
! 469: style='color:black'> - used to stop preventing xmlparse from hiding output. The
! 470:
! 471: return value is everthing that xmlparse has processed since the corresponding
! 472:
! 473: startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
! 474:
! 475: <p><span style='color:black'>*
! 476:
! 477: <i>Apache::run::evaluate</i></span><span
! 478:
! 479: style='color:black'> - 3 args, first a string, second a reference to the Safe
! 480:
! 481: space, 3 a string to be evaluated before the first arg. This subroutine will
! 482:
! 483: do variable interpolation and simple function interpolations on the first
! 484:
! 485: argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
! 486:
! 487: <p><span style='color:black'>*
! 488:
! 489: <i>Apache::run::run</i></span><span
! 490:
! 491: style='color:black'> - 2 args, first a string, second a reference to the Safe
! 492:
! 493: space. This handles passing the passed string into the Safe space for evaluation
! 494:
! 495: and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
! 496:
! 497: <h3><a name="_Toc421867122">Style Files</a></h3>
! 498:
! 499: <p><span style='color:black'> <img width=432 height=255
! 500:
! 501: src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
! 502:
! 503: <p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
! 504:
! 505: style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
! 506:
! 507: <p><span style='color:black'><b>Style File specific tags</b></span></p>
! 508:
! 509: <p><span style='color:black'><b><definetag></b></span><span
! 510:
! 511: style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'>
! 512:
! 513: name of new tag being defined, if proceeded with a / defining an end tag,
! 514:
! 515: required; <i>parms</i></span><span style='color:black'> parameters of the
! 516:
! 517: new tag, the value of these parameters can be accesed by $parametername. </span></p>
! 518:
! 519: <p><span style='color:black'>*
! 520:
! 521: <b><render></b></span><span
! 522:
! 523: style='color:black'> - define what the new tag does for a non meta target </span></p>
! 524:
! 525: <p><span style='color:black'>*
! 526:
! 527: <b><meta></b></span><span
! 528:
! 529: style='color:black'> - define what the new tag does for a meta target </span></p>
! 530:
! 531: <p><span style='color:black'>*
! 532:
! 533: <b><tex> / <web> / <latexsource></b></span><span style='color:black'>
! 534:
! 535: - define what a new tag does for a specific no meta target, all data inside
! 536:
! 537: a <render> is render to all targets except when surrounded by a specific
! 538:
! 539: target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
! 540:
! 541: <p class=MsoHeader> <img width=432 height=243
! 542:
! 543: src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
! 544:
! 545: <p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
! 546:
! 547: style='font-size:14.0pt'> Ð The parser</span></p>
! 548:
! 549: <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
! 550:
! 551: <p class=MsoPlainText>SYNOPSIS</p>
! 552:
! 553: <p class=MsoPlainText> require HTML::LCParser;</p>
! 554:
! 555: <p class=MsoPlainText> $p = HTML::LCParser->new("index.html")
! 556:
! 557: || die "Can't open: $!";</p>
! 558:
! 559: <p class=MsoPlainText> while (my $token = $p->get_token) {</p>
! 560:
! 561: <p class=MsoPlainText> #...</p>
! 562:
! 563: <p class=MsoPlainText> }</p>
! 564:
! 565: <p class=MsoPlainText>DESCRIPTION</p>
! 566:
! 567: <p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface
! 568:
! 569: to the</p>
! 570:
! 571: <p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser>
! 572:
! 573: subclass.</p>
! 574:
! 575: <p class=MsoPlainText>The following methods are available:</p>
! 576:
! 577: <p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p>
! 578:
! 579: <p class=MsoPlainText>The object constructor argument is either a file name,
! 580:
! 581: a file handle</p>
! 582:
! 583: <p class=MsoPlainText>object, or the complete document to be parsed.</p>
! 584:
! 585: <p class=MsoPlainText>If the argument is a plain scalar, then it is taken as
! 586:
! 587: the name of a</p>
! 588:
! 589: <p class=MsoPlainText>file to be opened and parsed. If the file can't
! 590:
! 591: be opened for</p>
! 592:
! 593: <p class=MsoPlainText>reading, then the constructor will return an undefined
! 594:
! 595: value and $!</p>
! 596:
! 597: <p class=MsoPlainText>will tell you why it failed.</p>
! 598:
! 599: <p class=MsoPlainText>If the argument is a reference to a plain scalar, then
! 600:
! 601: this scalar is</p>
! 602:
! 603: <p class=MsoPlainText>taken to be the literal document to parse. The value
! 604:
! 605: of this</p>
! 606:
! 607: <p class=MsoPlainText>scalar should not be changed before all tokens have been
! 608:
! 609: extracted.</p>
! 610:
! 611: <p class=MsoPlainText>Otherwise the argument is taken to be some object that
! 612:
! 613: the</p>
! 614:
! 615: <p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs
! 616:
! 617: more data. Typically</p>
! 618:
! 619: <p class=MsoPlainText>it will be a filehandle of some kind. The stream
! 620:
! 621: will be read() until</p>
! 622:
! 623: <p class=MsoPlainText>EOF, but not closed.</p>
! 624:
! 625: <p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
! 626:
! 627: <p class=MsoPlainText>* $p->get_token</p>
! 628:
! 629: <p class=MsoPlainText>This method will return the next I<token> found
! 630:
! 631: in the HTML document,</p>
! 632:
! 633: <p class=MsoPlainText>or C<undef> at the end of the document. The
! 634:
! 635: token is returned as an</p>
! 636:
! 637: <p class=MsoPlainText>array reference. The first element of the array
! 638:
! 639: will be a (mostly)</p>
! 640:
! 641: <p class=MsoPlainText>single character string denoting the type of this token:
! 642:
! 643: "S" for start</p>
! 644:
! 645: <p class=MsoPlainText>tag, "E" for end tag, "T" for text,
! 646:
! 647: "C" for comment, "D" for</p>
! 648:
! 649: <p class=MsoPlainText>declaration, and "PI" for process instructions.
! 650:
! 651: The rest of the array</p>
! 652:
! 653: <p class=MsoPlainText>is the same as the arguments passed to the corresponding
! 654:
! 655: HTML::Parser</p>
! 656:
! 657: <p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>).
! 658:
! 659: In summary, returned</p>
! 660:
! 661: <p class=MsoPlainText>tokens look like this:</p>
! 662:
! 663: <p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text,
! 664:
! 665: $line]</p>
! 666:
! 667: <p class=MsoPlainText> ["E", $tag, $text, $line]</p>
! 668:
! 669: <p class=MsoPlainText> ["T", $text, $is_data, $line]</p>
! 670:
! 671: <p class=MsoPlainText> ["C", $text, $line]</p>
! 672:
! 673: <p class=MsoPlainText> ["D", $text, $line]</p>
! 674:
! 675: <p class=MsoPlainText> ["PI", $token0, $text, $line]</p>
! 676:
! 677: <p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array
! 678:
! 679: reference and</p>
! 680:
! 681: <p class=MsoPlainText>the rest are plain scalars.</p>
! 682:
! 683: <p class=MsoPlainText>* $p->unget_token($token,...)</p>
! 684:
! 685: <p class=MsoPlainText>If you find out you have read too many tokens you can
! 686:
! 687: push them back,</p>
! 688:
! 689: <p class=MsoPlainText>so that they are returned the next time $p->get_token
! 690:
! 691: is called.</p>
! 692:
! 693: <p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p>
! 694:
! 695: <p class=MsoPlainText>This method returns the next start or end tag (skipping
! 696:
! 697: any other</p>
! 698:
! 699: <p class=MsoPlainText>tokens), or C<undef> if there are no more tags in
! 700:
! 701: the document. If</p>
! 702:
! 703: <p class=MsoPlainText>one or more arguments are given, then we skip tokens until
! 704:
! 705: one of the</p>
! 706:
! 707: <p class=MsoPlainText>specified tag types is found. For example:</p>
! 708:
! 709: <p class=MsoPlainText> $p->get_tag("font", "/font");</p>
! 710:
! 711: <p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
! 712:
! 713: <p class=MsoPlainText>The tag information is returned as an array reference
! 714:
! 715: in the same form</p>
! 716:
! 717: <p class=MsoPlainText>as for $p->get_token above, but the type code (first
! 718:
! 719: element) is</p>
! 720:
! 721: <p class=MsoPlainText>missing. A start tag will be returned like this:</p>
! 722:
! 723: <p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p>
! 724:
! 725: <p class=MsoPlainText>The tagname of end tags are prefixed with "/",
! 726:
! 727: i.e. end tag is</p>
! 728:
! 729: <p class=MsoPlainText>returned like this:</p>
! 730:
! 731: <p class=MsoPlainText> ["/$tag", $text]</p>
! 732:
! 733: <p class=MsoPlainText>* $p->get_text( [$endtag] )</p>
! 734:
! 735: <p class=MsoPlainText>This method returns all text found at the current position.
! 736:
! 737: It will</p>
! 738:
! 739: <p class=MsoPlainText>return a zero length string if the next token is not text.
! 740:
! 741: The</p>
! 742:
! 743: <p class=MsoPlainText>optional $endtag argument specifies that any text occurring
! 744:
! 745: before the</p>
! 746:
! 747: <p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
! 748:
! 749: <p class=MsoPlainText>The $p->{textify} attribute is a hash that defines
! 750:
! 751: how certain tags can</p>
! 752:
! 753: <p class=MsoPlainText>be treated as text. If the name of a start tag matches
! 754:
! 755: a key in this</p>
! 756:
! 757: <p class=MsoPlainText>hash then this tag is converted to text. The hash
! 758:
! 759: value is used to</p>
! 760:
! 761: <p class=MsoPlainText>specify which tag attribute to obtain the text from.
! 762:
! 763: If this tag</p>
! 764:
! 765: <p class=MsoPlainText>attribute is missing, then the upper case name of the
! 766:
! 767: tag enclosed in</p>
! 768:
! 769: <p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The
! 770:
! 771: hash value can also be a</p>
! 772:
! 773: <p class=MsoPlainText>subroutine reference. In this case the routine is
! 774:
! 775: called with the</p>
! 776:
! 777: <p class=MsoPlainText>start tag token content as its argument and the return
! 778:
! 779: value is treated</p>
! 780:
! 781: <p class=MsoPlainText>as the text.</p>
! 782:
! 783: <p class=MsoPlainText>The default $p->{textify} value is:</p>
! 784:
! 785: <p class=MsoPlainText> {img => "alt", applet => "alt"}</p>
! 786:
! 787: <p class=MsoPlainText>This means that <IMG> and <APPLET> tags are
! 788:
! 789: treated as text, and that</p>
! 790:
! 791: <p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
! 792:
! 793: <p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p>
! 794:
! 795: <p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences
! 796:
! 797: of white</p>
! 798:
! 799: <p class=MsoPlainText>space to a single space character. Leading and trailing
! 800:
! 801: white space is</p>
! 802:
! 803: <p class=MsoPlainText>removed.</p>
! 804:
! 805: <p class=MsoPlainText>EXAMPLES</p>
! 806:
! 807: <p class=MsoPlainText>This example extracts all links from a document.
! 808:
! 809: It will print one</p>
! 810:
! 811: <p class=MsoPlainText>line for each link, containing the URL and the textual
! 812:
! 813: description</p>
! 814:
! 815: <p class=MsoPlainText>between the <A>...</A> tags:</p>
! 816:
! 817: <p class=MsoPlainText> use HTML::LCParser;</p>
! 818:
! 819: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
! 820:
! 821: <p class=MsoPlainText> while (my $token = $p->get_tag("a"))
! 822:
! 823: {</p>
! 824:
! 825: <p class=MsoPlainText> my $url = $token->[1]{href}
! 826:
! 827: || "-";</p>
! 828:
! 829: <p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p>
! 830:
! 831: <p class=MsoPlainText> print "$url\t$text\n";</p>
! 832:
! 833: <p class=MsoPlainText> }</p>
! 834:
! 835: <p class=MsoPlainText>This example extract the <TITLE> from the document:</p>
! 836:
! 837: <p class=MsoPlainText> use HTML::LCParser;</p>
! 838:
! 839: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
! 840:
! 841: <p class=MsoPlainText> if ($p->get_tag("title")) {</p>
! 842:
! 843: <p class=MsoPlainText> my $title = $p->get_trimmed_text;</p>
! 844:
! 845: <p class=MsoPlainText> print "Title: $title\n";</p>
! 846:
! 847: <p class=MsoPlainText> }</p>
! 848:
! 849: </div>
! 850:
! 851: <br
! 852:
! 853: clear=ALL style='page-break-before:always;'>
! 854:
! 855: <div class=Section2> </div>
! 856:
! 857: </body>
! 858:
! 859: </html>
! 860:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>