File:
[LON-CAPA] /
doc /
gutshtml /
SessionFou1.html
Revision
1.2:
download - view:
text,
annotated -
select for diffs
Tue Jul 22 14:47:00 2003 UTC (21 years, 7 months ago) by
bowersj2
Branches:
MAIN
CVS tags:
version_2_9_X,
version_2_9_99_0,
version_2_9_1,
version_2_9_0,
version_2_8_X,
version_2_8_99_1,
version_2_8_99_0,
version_2_8_2,
version_2_8_1,
version_2_8_0,
version_2_7_X,
version_2_7_99_1,
version_2_7_99_0,
version_2_7_1,
version_2_7_0,
version_2_6_X,
version_2_6_99_1,
version_2_6_99_0,
version_2_6_3,
version_2_6_2,
version_2_6_1,
version_2_6_0,
version_2_5_X,
version_2_5_99_1,
version_2_5_99_0,
version_2_5_2,
version_2_5_1,
version_2_5_0,
version_2_4_X,
version_2_4_99_0,
version_2_4_2,
version_2_4_1,
version_2_4_0,
version_2_3_X,
version_2_3_99_0,
version_2_3_2,
version_2_3_1,
version_2_3_0,
version_2_2_X,
version_2_2_99_1,
version_2_2_99_0,
version_2_2_2,
version_2_2_1,
version_2_2_0,
version_2_1_X,
version_2_1_99_3,
version_2_1_99_2,
version_2_1_99_1,
version_2_1_99_0,
version_2_1_3,
version_2_1_2,
version_2_1_1,
version_2_1_0,
version_2_12_X,
version_2_11_X,
version_2_11_6_msu,
version_2_11_6,
version_2_11_5_msu,
version_2_11_5,
version_2_11_4_uiuc,
version_2_11_4_msu,
version_2_11_4,
version_2_11_3_uiuc,
version_2_11_3_msu,
version_2_11_3,
version_2_11_2_uiuc,
version_2_11_2_msu,
version_2_11_2_educog,
version_2_11_2,
version_2_11_1,
version_2_11_0_RC3,
version_2_11_0_RC2,
version_2_11_0_RC1,
version_2_11_0,
version_2_10_X,
version_2_10_1,
version_2_10_0_RC2,
version_2_10_0_RC1,
version_2_10_0,
version_2_0_X,
version_2_0_99_1,
version_2_0_2,
version_2_0_1,
version_2_0_0,
version_1_99_3,
version_1_99_2,
version_1_99_1_tmcc,
version_1_99_1,
version_1_99_0_tmcc,
version_1_99_0,
version_1_3_X,
version_1_3_3,
version_1_3_2,
version_1_3_1,
version_1_3_0,
version_1_2_X,
version_1_2_99_1,
version_1_2_99_0,
version_1_2_1,
version_1_2_0,
version_1_1_X,
version_1_1_99_5,
version_1_1_99_4,
version_1_1_99_3,
version_1_1_99_2,
version_1_1_99_1,
version_1_1_99_0,
version_1_1_3,
version_1_1_2,
version_1_1_1,
version_1_1_0,
version_1_0_99_3,
version_1_0_99_2,
version_1_0_99_1,
version_1_0_99,
version_1_0_3,
version_1_0_2,
version_1_0_1,
version_1_0_0,
version_0_99_5,
version_0_99_4,
loncapaMITrelate_1,
language_hyphenation_merge,
language_hyphenation,
bz6209-base,
bz6209,
HEAD,
GCI_3,
GCI_2,
GCI_1,
BZ4492-merge,
BZ4492-feature_horizontal_radioresponse,
BZ4492-feature_Support_horizontal_radioresponse,
BZ4492-Support_horizontal_radioresponse
Convert GUTs HTML to PROPER line endings.
1: <html>
2:
3: <head>
4:
5: <meta name=Title
6:
7: content="Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style Files) (Guy)">
8:
9: <meta http-equiv=Content-Type content="text/html; charset=macintosh">
10:
11: <link rel=Edit-Time-Data href="Session%20Fou1_files/editdata.mso">
12:
13: <title>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
14:
15: Files) (Guy)</title>
16:
17: <style><!--
18:
19: .MsoHeader
20:
21: {tab-stops:center 3.0in right 6.0in;
22:
23: font-size:10.0pt;
24:
25: font-family:"Times New Roman";}
26:
27: .MsoPlainText
28:
29: {font-size:10.0pt;
30:
31: font-family:"Courier New";}
32:
33: .Section1
34:
35: {page:Section1;}
36:
37: .Section2
38:
39: {page:Section2;}
40:
41: -->
42:
43: </style>
44:
45: </head>
46:
47: <body bgcolor=#FFFFFF link=blue vlink=purple class="Normal" lang=EN-US>
48:
49: <div class=Section1>
50:
51: <h2>Session Four: XML Handler (Simple tags, Globals, Multiple Targets, Style
52:
53: Files) (Guy)</h2>
54:
55: <h3><a name="_Toc421867121">XML Files</a></h3>
56:
57: <p><span style='color:black'>All HTML / XML files are run through the lonxml
58:
59: handler before being served to a user. This allows us to rewrite many portion
60:
61: of a document and to support serverside tags. There are 2 ways to add new
62:
63: tags to the xml parsing engine, either through LON-CAPA style files or by
64:
65: writing Perl tag handlers for the desired tags. </span></p>
66:
67: <p><span style='color:black'><b>Global Variables</b></span></p>
68:
69: <p><span style='color:black'>*
70:
71: <i>$Apache::lonxml::debug</i></span><span
72:
73: style='color:black'> - debugging control </span></p>
74:
75: <p><span style='color:black'>*
76:
77: <i>@Apache::lonxml::pwd</i></span><span
78:
79: style='color:black'> - path to the directory containing the file currently being
80:
81: processed </span></p>
82:
83: <p><span style='color:black'>*
84:
85: <i>@Apache::lonxml::outputstack</i></span><span
86:
87: style='color:black'> </span></p>
88:
89: <p><span style='color:black'><i>$Apache::lonxml::redirection</i></span><span
90:
91: style='color:black'> - these two are used for capturing a subset of the output
92:
93: for later processing, don't touch them directly use &startredirection
94:
95: and &endredirection </span></p>
96:
97: <p><span style='color:black'>*
98:
99: <i>$Apache::lonxml::import</i></span><span
100:
101: style='color:black'> - controls whether the <import> tag actually does anything
102:
103: </span></p>
104:
105: <p><span style='color:black'>*
106:
107: <i>@Apache::lonxml::extlinks</i></span><span
108:
109: style='color:black'> - a list of URLs that the user is allowed to look at because
110:
111: of the current resource (images, and links) </span></p>
112:
113: <p><span style='color:black'>*
114:
115: <i>$Apache::lonxml::metamode</i></span><span
116:
117: style='color:black'> - some output is turned off, the meta target wants a specific
118:
119: subset, use <output> to guarentee that the catianed data will be in
120:
121: the parsing output </span></p>
122:
123: <p><span style='color:black'>*
124:
125: <i>$Apache::lonxml::evaluate</i></span><span
126:
127: style='color:black'> - controls whether run::evaluate actually derefences variable
128:
129: references </span></p>
130:
131: <p><span style='color:black'>*
132:
133: <i>%Apache::lonxml::insertlist</i></span><span
134:
135: style='color:black'> - data structure for edit mode, determines what tags can
136:
137: go into what other tags </span></p>
138:
139: <p><span style='color:black'>*
140:
141: <i>@Apache::lonxml::namespace</i></span><span
142:
143: style='color:black'> - stores the list of tag namespaces used in the insertlist.tab
144:
145: file that are currently active, used only in edit mode. </span></p>
146:
147: <p><span style='color:black'>*
148:
149: <i>$Apache::lonxml::registered</i></span><span
150:
151: style='color:black'> - set to 1 once the remote has been updated to know what
152:
153: resource we are looking at. </span></p>
154:
155: <p><span style='color:black'>*
156:
157: <i>$Apache::lonxml::request</i></span><span
158:
159: style='color:black'> - current Apache request object, or undef </span></p>
160:
161: <p><span style='color:black'>*
162:
163: <i>$Apache::lonxml::curdepth</i></span><span
164:
165: style='color:black'> - current depth of the overall parse depth. Will be a string
166:
167: like: 2_3_1 (first tag in the third second level tag in the second toplevel
168:
169: tag). It gets set by callsub, and can be used in Perl tag implementations.
170:
171: It relies upon the internal globals: <i>@Apache::lonxml::depthcounter</i></span><span
172:
173: style='color:black'>, <i>$Apache::lonxml::depth</i></span><span
174:
175: style='color:black'>, <i>$Apache::lonxml::olddepth</i></span><span
176:
177: style='color:black'> </span></p>
178:
179: <p><span style='color:black'>*
180:
181: <i>$Apache::lonxml::prevent_entity_encode</i></span><span
182:
183: style='color:black'> - By default the xmlparser will try to rencode any 8-bit
184:
185: characters into HTMLEntity Codes, If this is set to a true value it will be
186:
187: prevented. </span></p>
188:
189: <p><span style='color:black'>In common usage, <i>$Apache::lonxml::prevent_entity_encode</i></span><span
190:
191: style='color:black'>, <i>$Apache::lonxml::evaluate</i></span><span
192:
193: style='color:black'>, <i>$Apache::lonxml::metamode</i></span><span
194:
195: style='color:black'>, <i>$Apache::lonxml::import</i></span><span
196:
197: style='color:black'>, should never be set to a value directly, but rather incremented
198:
199: when you want the effect on, and decremented when you want the effect off.
200:
201: </span></p>
202:
203: <p><span style='color:black'><b>Notable Perl subroutines</b></span></p>
204:
205: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
206:
207: </span></p>
208:
209: <p><span style='color:black'>*
210:
211: <i>xmlparse</i></span><span
212:
213: style='color:black'> - see the XMLPARSE figure - also not callable from inside
214:
215: a tag, if one needs to restart parsing, either create add a new LCParser to
216:
217: the parser stack parser using the newparser function, or call inner_xmlparser,
218:
219: see the xmlparse function in scripttag.pm </span></p>
220:
221: <p><span style='color:black'>*
222:
223: <i>recurse</i></span><span
224:
225: style='color:black'> - acts just like <i>xmlparse</i></span><span
226:
227: style='color:black'>, except it doesn't do the style definition check it always
228:
229: calls <i>callsub</i></span><span style='color:black'> </span></p>
230:
231: <p><span style='color:black'>*
232:
233: <i>callsub</i></span><span
234:
235: style='color:black'> - callsub looks if a perl subroutine is defined for the current
236:
237: tag and calls. Otherwise it just returns the tag as it was read in. It also
238:
239: will throw on a default editing interface unless the tag has a defined subroutine
240:
241: that either returns something or requests that call sub not add the editing
242:
243: interface. </span></p>
244:
245: <p><span style='color:black'>*
246:
247: <i>afterburn</i></span><span
248:
249: style='color:black'> - called on the output of xmlparse, it can add highlights,
250:
251: anchors, and links to regular expersion matches to the output. </span></p>
252:
253: <p><span style='color:black'>*
254:
255: <i>register_insert</i></span><span
256:
257: style='color:black'> - builds the %Apache::lonxml::insertlist structure of what
258:
259: tags can have what other tags inside. </span></p>
260:
261: <p><span style='color:black'>*
262:
263: <i>whichuser</i></span><span
264:
265: style='color:black'> - returns a list of $symb, $courseid, $domain, $name that
266:
267: is correct for calls to lonnet functions for this setup. Uses form.grade_
268:
269: parameters, if the user is allowed to mgr in the course </span></p>
270:
271: <p><span style='color:black'>*
272:
273: <i>setup_globals</i></span><span
274:
275: style='color:black'> - initializes all lonxml globals when xmlparse is called.
276:
277: If you intend to create a new target you will likely need to tweak how the
278:
279: globals are setup upon start up. </span></p>
280:
281: <p><span style='color:black'>*
282:
283: <i>init_safespace</i></span><span
284:
285: style='color:black'> - creates Holes to external functions, creates some global
286:
287: variables, and set the permitted operators of the global Safespace intepreter.
288:
289: </span></p>
290:
291: <p><span style='color:black'><b>Functions Tag Handlers can use</b></span></p>
292:
293: <p><span style='color:black'>If not specified these functions are in Apache::lonxml
294:
295: </span></p>
296:
297: <p><span style='color:black'>*
298:
299: <i>debug</i></span><span
300:
301: style='color:black'> - a function to call to printout debugging messages. Will
302:
303: only print when Apache::lonxml::debug is set to 1 </span></p>
304:
305: <p><span style='color:black'>*
306:
307: <i>warning</i></span><span
308:
309: style='color:black'> - a function to use for warning messages. The message will
310:
311: appear at the top of a resource when it is viewed in construction space only.
312:
313: </span></p>
314:
315: <p><span style='color:black'>*
316:
317: <i>error</i></span><span
318:
319: style='color:black'> - a function to use for error messages. The message will
320:
321: appear at the top of a resource when it is viewed in construction space, and
322:
323: will message the resource author and course instructor, while informing the
324:
325: student that an error has occured otherwise. </span></p>
326:
327: <p><span style='color:black'>*
328:
329: <i>get_all_text</i></span><span
330:
331: style='color:black'> - 2 args, tag to look for (need to use /tag to look for an
332:
333: end tag) and a HTML::TokeParser reference, it will repedelyt get text from
334:
335: the TokeParser until the requested tag is found. It will return all of the
336:
337: document it pulled form the TokeParser. (See Apache::scripttag::start_script
338:
339: for an example of usage.) </span></p>
340:
341: <p><span style='color:black'>*
342:
343: <i>get_param</i></span><span
344:
345: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
346:
347: second is a reference to the parser arguments stack, third is a reference
348:
349: to the Safe space, and fourth is an optional "context" value. This
350:
351: subroutine allows a tag to get a tag argument, after being interpolated inside
352:
353: the Safe space. This should be used if the tag might use a safe space variable
354:
355: reference for the tag argument. (See Apache::scripttag::start_script for an
356:
357: example.) This version only handles scalar variables. </span></p>
358:
359: <p><span style='color:black'>*
360:
361: <i>get_param_var</i></span><span
362:
363: style='color:black'> - 4 arguments, first is a scaler sting of the argument needed,
364:
365: second is a reference to the parser arguments stack, third is a reference
366:
367: to the Safe space, and fourth is an optional "context" value. This
368:
369: subroutine allows a tag to get a tag argument, after being interpolated inside
370:
371: the Safe space. This should be used if the tag might use a safe space variable
372:
373: reference for the tag argument. (See Apache::scripttag::start_script for an
374:
375: example.) This version can handle list or hash variables properly. </span></p>
376:
377: <p><span style='color:black'>*
378:
379: <i>description</i></span><span
380:
381: style='color:black'> - 1 argument, the token object. This will return the textual
382:
383: decription of the current tag from the insertlist.tab file. </span></p>
384:
385: <p><span style='color:black'>*
386:
387: <i>whichuser</i></span><span
388:
389: style='color:black'> - 0 arguments. This will take a look at the current environment
390:
391: setting and return the current $symb, $courseid, $udom, $uname. You should
392:
393: always use this function if you want to determine who the current user is.
394:
395: (Since a instructor might be trying to view a students version of a resource.)
396:
397: </span></p>
398:
399: <p><span style='color:black'>*
400:
401: <i>inner_xmlparse</i></span><span
402:
403: style='color:black'> - 6 arguments, the target, an array pointer to the current
404:
405: stack of tags, and array pointer to the current stack of tag arguments, an
406:
407: array pointer to the current stack of LCParser's, a pointer to the current
408:
409: Safe space, a pointer to the hash of current style definitions </span></p>
410:
411: <p><span style='color:black'>*
412:
413: <i>newparser</i></span><span
414:
415: style='color:black'> - 3 args, first is a reference to the parser stack, second
416:
417: should be a reference to a string scaler containg the text the newparser should
418:
419: run over, third should be a scaler of the directory path the file the parser
420:
421: is parsing was in. (See Apache::scripttag::start_import for an example.) </span></p>
422:
423: <p><span style='color:black'>*
424:
425: <i>register</i></span><span
426:
427: style='color:black'> - should be called in a file's BEGIN block. 2 arguments,
428:
429: a scaler string, and a list of strings. This allows a file to register what
430:
431: tags it handles, and what the namespace of those tags are. Example: </span></p>
432:
433: <p><span style='font-family:"Courier New";color:black'>sub BEGIN {</span></p>
434:
435: <p><span style='font-family:"Courier New";color:black'> &Apache::lonxml::register('Apache::scripttag',('script','display'));</span></p>
436:
437: <p><span style='font-family:"Courier New";color:black'>}</span></p>
438:
439: <p><span style='color:black'>Would tell xmlparse that in Apache::scripttag it
440:
441: can find handlers for <script> and <display>, if one regsiters
442:
443: a tag that was already registered the previous one is remembered and will
444:
445: be restored on a deregister. </span></p>
446:
447: <p><span style='color:black'>*
448:
449: <i>deregister</i></span><span
450:
451: style='color:black'> - used to remove a previously registered tag implementation.
452:
453: It will restore the previous registration if there was one. </span></p>
454:
455: <p><span style='color:black'>*
456:
457: <i>startredirection</i></span><span
458:
459: style='color:black'> - used when a tag wants to save a portion of the document
460:
461: for its end tag to use, but wants the intervening document to be normally
462:
463: processed. (See Apache::scripttag::start_window for an example.) </span></p>
464:
465: <p><span style='color:black'>*
466:
467: <i>endredirection</i></span><span
468:
469: style='color:black'> - used to stop preventing xmlparse from hiding output. The
470:
471: return value is everthing that xmlparse has processed since the corresponding
472:
473: startredirection. (See Apache::scripttag::end_window for an example.) </span></p>
474:
475: <p><span style='color:black'>*
476:
477: <i>Apache::run::evaluate</i></span><span
478:
479: style='color:black'> - 3 args, first a string, second a reference to the Safe
480:
481: space, 3 a string to be evaluated before the first arg. This subroutine will
482:
483: do variable interpolation and simple function interpolations on the first
484:
485: argument. (See Apache::lonxml::inner_xmlparse for an example.) </span></p>
486:
487: <p><span style='color:black'>*
488:
489: <i>Apache::run::run</i></span><span
490:
491: style='color:black'> - 2 args, first a string, second a reference to the Safe
492:
493: space. This handles passing the passed string into the Safe space for evaluation
494:
495: and then returns the result. (See Apache::scripttag::start_script for an example.)</span></p>
496:
497: <h3><a name="_Toc421867122">Style Files</a></h3>
498:
499: <p><span style='color:black'> <img width=432 height=255
500:
501: src="Session%20Fou1_files/image002.jpg" v:shapes="_x0000_i1025"> </span></p>
502:
503: <p><span style='font-size:14.0pt;color:black'><b>Fig. 2.4.1</b></span><span
504:
505: style='font-size:14.0pt;color:black'> Ð Using a style file</span></p>
506:
507: <p><span style='color:black'><b>Style File specific tags</b></span></p>
508:
509: <p><span style='color:black'><b><definetag></b></span><span
510:
511: style='color:black'> - 2 arguments, <i>name</i></span><span style='color:black'>
512:
513: name of new tag being defined, if proceeded with a / defining an end tag,
514:
515: required; <i>parms</i></span><span style='color:black'> parameters of the
516:
517: new tag, the value of these parameters can be accesed by $parametername. </span></p>
518:
519: <p><span style='color:black'>*
520:
521: <b><render></b></span><span
522:
523: style='color:black'> - define what the new tag does for a non meta target </span></p>
524:
525: <p><span style='color:black'>*
526:
527: <b><meta></b></span><span
528:
529: style='color:black'> - define what the new tag does for a meta target </span></p>
530:
531: <p><span style='color:black'>*
532:
533: <b><tex> / <web> / <latexsource></b></span><span style='color:black'>
534:
535: - define what a new tag does for a specific no meta target, all data inside
536:
537: a <render> is render to all targets except when surrounded by a specific
538:
539: target tags.</span><span style='font-size:16.0pt;color:black'> </span></p>
540:
541: <p class=MsoHeader> <img width=432 height=243
542:
543: src="Session%20Fou1_files/image005.png" v:shapes="_x0000_i1026"> </p>
544:
545: <p><span style='font-size:14.0pt'><b>Fig. 2.4.2</b></span><span
546:
547: style='font-size:14.0pt'> Ð The parser</span></p>
548:
549: <h3><a name="_Toc421867123">HTML::LCParser - Alternative HTML::Parser interface</a></h3>
550:
551: <p class=MsoPlainText>SYNOPSIS</p>
552:
553: <p class=MsoPlainText> require HTML::LCParser;</p>
554:
555: <p class=MsoPlainText> $p = HTML::LCParser->new("index.html")
556:
557: || die "Can't open: $!";</p>
558:
559: <p class=MsoPlainText> while (my $token = $p->get_token) {</p>
560:
561: <p class=MsoPlainText> #...</p>
562:
563: <p class=MsoPlainText> }</p>
564:
565: <p class=MsoPlainText>DESCRIPTION</p>
566:
567: <p class=MsoPlainText>The C<HTML::LCParser> is an alternative interface
568:
569: to the</p>
570:
571: <p class=MsoPlainText>C<HTML::Parser> class. It is an C<HTML::PullParser>
572:
573: subclass.</p>
574:
575: <p class=MsoPlainText>The following methods are available:</p>
576:
577: <p class=MsoPlainText>* $p = HTML::LCParser->new( $file_or_doc );</p>
578:
579: <p class=MsoPlainText>The object constructor argument is either a file name,
580:
581: a file handle</p>
582:
583: <p class=MsoPlainText>object, or the complete document to be parsed.</p>
584:
585: <p class=MsoPlainText>If the argument is a plain scalar, then it is taken as
586:
587: the name of a</p>
588:
589: <p class=MsoPlainText>file to be opened and parsed. If the file can't
590:
591: be opened for</p>
592:
593: <p class=MsoPlainText>reading, then the constructor will return an undefined
594:
595: value and $!</p>
596:
597: <p class=MsoPlainText>will tell you why it failed.</p>
598:
599: <p class=MsoPlainText>If the argument is a reference to a plain scalar, then
600:
601: this scalar is</p>
602:
603: <p class=MsoPlainText>taken to be the literal document to parse. The value
604:
605: of this</p>
606:
607: <p class=MsoPlainText>scalar should not be changed before all tokens have been
608:
609: extracted.</p>
610:
611: <p class=MsoPlainText>Otherwise the argument is taken to be some object that
612:
613: the</p>
614:
615: <p class=MsoPlainText>C<HTML::LCParser> can read() from when it needs
616:
617: more data. Typically</p>
618:
619: <p class=MsoPlainText>it will be a filehandle of some kind. The stream
620:
621: will be read() until</p>
622:
623: <p class=MsoPlainText>EOF, but not closed.</p>
624:
625: <p class=MsoPlainText>It also will turn attr_encoded on by default.</p>
626:
627: <p class=MsoPlainText>* $p->get_token</p>
628:
629: <p class=MsoPlainText>This method will return the next I<token> found
630:
631: in the HTML document,</p>
632:
633: <p class=MsoPlainText>or C<undef> at the end of the document. The
634:
635: token is returned as an</p>
636:
637: <p class=MsoPlainText>array reference. The first element of the array
638:
639: will be a (mostly)</p>
640:
641: <p class=MsoPlainText>single character string denoting the type of this token:
642:
643: "S" for start</p>
644:
645: <p class=MsoPlainText>tag, "E" for end tag, "T" for text,
646:
647: "C" for comment, "D" for</p>
648:
649: <p class=MsoPlainText>declaration, and "PI" for process instructions.
650:
651: The rest of the array</p>
652:
653: <p class=MsoPlainText>is the same as the arguments passed to the corresponding
654:
655: HTML::Parser</p>
656:
657: <p class=MsoPlainText>v2 compatible callbacks (see L<HTML::Parser>).
658:
659: In summary, returned</p>
660:
661: <p class=MsoPlainText>tokens look like this:</p>
662:
663: <p class=MsoPlainText> ["S", $tag, $attr, $attrseq, $text,
664:
665: $line]</p>
666:
667: <p class=MsoPlainText> ["E", $tag, $text, $line]</p>
668:
669: <p class=MsoPlainText> ["T", $text, $is_data, $line]</p>
670:
671: <p class=MsoPlainText> ["C", $text, $line]</p>
672:
673: <p class=MsoPlainText> ["D", $text, $line]</p>
674:
675: <p class=MsoPlainText> ["PI", $token0, $text, $line]</p>
676:
677: <p class=MsoPlainText>where $attr is a hash reference, $attrseq is an array
678:
679: reference and</p>
680:
681: <p class=MsoPlainText>the rest are plain scalars.</p>
682:
683: <p class=MsoPlainText>* $p->unget_token($token,...)</p>
684:
685: <p class=MsoPlainText>If you find out you have read too many tokens you can
686:
687: push them back,</p>
688:
689: <p class=MsoPlainText>so that they are returned the next time $p->get_token
690:
691: is called.</p>
692:
693: <p class=MsoPlainText>* $p->get_tag( [$tag, ...] )</p>
694:
695: <p class=MsoPlainText>This method returns the next start or end tag (skipping
696:
697: any other</p>
698:
699: <p class=MsoPlainText>tokens), or C<undef> if there are no more tags in
700:
701: the document. If</p>
702:
703: <p class=MsoPlainText>one or more arguments are given, then we skip tokens until
704:
705: one of the</p>
706:
707: <p class=MsoPlainText>specified tag types is found. For example:</p>
708:
709: <p class=MsoPlainText> $p->get_tag("font", "/font");</p>
710:
711: <p class=MsoPlainText>will find the next start or end tag for a font-element.</p>
712:
713: <p class=MsoPlainText>The tag information is returned as an array reference
714:
715: in the same form</p>
716:
717: <p class=MsoPlainText>as for $p->get_token above, but the type code (first
718:
719: element) is</p>
720:
721: <p class=MsoPlainText>missing. A start tag will be returned like this:</p>
722:
723: <p class=MsoPlainText> [$tag, $attr, $attrseq, $text]</p>
724:
725: <p class=MsoPlainText>The tagname of end tags are prefixed with "/",
726:
727: i.e. end tag is</p>
728:
729: <p class=MsoPlainText>returned like this:</p>
730:
731: <p class=MsoPlainText> ["/$tag", $text]</p>
732:
733: <p class=MsoPlainText>* $p->get_text( [$endtag] )</p>
734:
735: <p class=MsoPlainText>This method returns all text found at the current position.
736:
737: It will</p>
738:
739: <p class=MsoPlainText>return a zero length string if the next token is not text.
740:
741: The</p>
742:
743: <p class=MsoPlainText>optional $endtag argument specifies that any text occurring
744:
745: before the</p>
746:
747: <p class=MsoPlainText>given tag is to be returned. All entities are unmodified.</p>
748:
749: <p class=MsoPlainText>The $p->{textify} attribute is a hash that defines
750:
751: how certain tags can</p>
752:
753: <p class=MsoPlainText>be treated as text. If the name of a start tag matches
754:
755: a key in this</p>
756:
757: <p class=MsoPlainText>hash then this tag is converted to text. The hash
758:
759: value is used to</p>
760:
761: <p class=MsoPlainText>specify which tag attribute to obtain the text from.
762:
763: If this tag</p>
764:
765: <p class=MsoPlainText>attribute is missing, then the upper case name of the
766:
767: tag enclosed in</p>
768:
769: <p class=MsoPlainText>brackets is returned, e.g. "[IMG]". The
770:
771: hash value can also be a</p>
772:
773: <p class=MsoPlainText>subroutine reference. In this case the routine is
774:
775: called with the</p>
776:
777: <p class=MsoPlainText>start tag token content as its argument and the return
778:
779: value is treated</p>
780:
781: <p class=MsoPlainText>as the text.</p>
782:
783: <p class=MsoPlainText>The default $p->{textify} value is:</p>
784:
785: <p class=MsoPlainText> {img => "alt", applet => "alt"}</p>
786:
787: <p class=MsoPlainText>This means that <IMG> and <APPLET> tags are
788:
789: treated as text, and that</p>
790:
791: <p class=MsoPlainText>the text to substitute can be found in the ALT attribute.</p>
792:
793: <p class=MsoPlainText>* $p->get_trimmed_text( [$endtag] )</p>
794:
795: <p class=MsoPlainText>Same as $p->get_text above, but will collapse any sequences
796:
797: of white</p>
798:
799: <p class=MsoPlainText>space to a single space character. Leading and trailing
800:
801: white space is</p>
802:
803: <p class=MsoPlainText>removed.</p>
804:
805: <p class=MsoPlainText>EXAMPLES</p>
806:
807: <p class=MsoPlainText>This example extracts all links from a document.
808:
809: It will print one</p>
810:
811: <p class=MsoPlainText>line for each link, containing the URL and the textual
812:
813: description</p>
814:
815: <p class=MsoPlainText>between the <A>...</A> tags:</p>
816:
817: <p class=MsoPlainText> use HTML::LCParser;</p>
818:
819: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
820:
821: <p class=MsoPlainText> while (my $token = $p->get_tag("a"))
822:
823: {</p>
824:
825: <p class=MsoPlainText> my $url = $token->[1]{href}
826:
827: || "-";</p>
828:
829: <p class=MsoPlainText> my $text = $p->get_trimmed_text("/a");</p>
830:
831: <p class=MsoPlainText> print "$url\t$text\n";</p>
832:
833: <p class=MsoPlainText> }</p>
834:
835: <p class=MsoPlainText>This example extract the <TITLE> from the document:</p>
836:
837: <p class=MsoPlainText> use HTML::LCParser;</p>
838:
839: <p class=MsoPlainText> $p = HTML::LCParser->new(shift||"index.html");</p>
840:
841: <p class=MsoPlainText> if ($p->get_tag("title")) {</p>
842:
843: <p class=MsoPlainText> my $title = $p->get_trimmed_text;</p>
844:
845: <p class=MsoPlainText> print "Title: $title\n";</p>
846:
847: <p class=MsoPlainText> }</p>
848:
849: </div>
850:
851: <br
852:
853: clear=ALL style='page-break-before:always;'>
854:
855: <div class=Section2> </div>
856:
857: </body>
858:
859: </html>
860:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>