Comment detail

一部のHTMLタグを通すフィルタ (Nested Flatten)
Squeak Smalltalk で。

例によって正規表現が使えないので手続き的に。この調子だと、より複雑なことが要求される続編が思いやられます…(^_^;)。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| string in tag out save rest |
string := '<a href=''www.google.com''>link</a> <blink>and</blink> <strong onClick=''alert("NG")''>click<br/>me!</strong>'.

in := string readStream.
out := String new writeStream.
[in atEnd] whileFalse: [
	out nextPutAll: (in upTo: $<).
	in back.
	save := in position.
	tag := in upTo: Character space.
	(tag includes: $/) ifTrue: [in position: save. tag := in upTo: $>. in back].
	out nextPutAll: ((#('<a' '<br/' '<strong' '</a' '</strong') includes: tag asLowercase)
		ifTrue: [tag] ifFalse: [tag := '&lt;', tag allButFirst]).
	tag := tag asLowercase.
	[save := in position. (rest := in upTo: $>) includes: $=] whileTrue: [
		| attr quote data |
		in position: save.
		attr := in upTo: $=.
		quote := (#($' $") includes: in peek) ifTrue: [in next] ifFalse: [Character space].
		data := in upTo: quote.
		quote := quote = Character space ifTrue: [''] ifFalse: [quote asString].
		data := attr, '=', quote, data, quote.
		in skipSeparators.
		(tag = '<a' and: [#(href name) includes: attr]) ifTrue: [out space; nextPutAll: data].
		(#('<a' '<br/' '<strong') includes: tag) ifFalse: [
			out space.
			data do: [:chr | chr = $< ifTrue: [out nextPutAll: '&lt;'] ifFalse: [out nextPut: chr]]]].
	out nextPutAll: rest, '>'].
^out contents

"=> '<a href=''www.google.com''>link</a> &lt;blink>and&lt;/blink> <strong>click<br/>me!</strong>' "
正規表現ないとこれはキツそうですね・・・

<br>がエスケープされるので<brをリストにいれたほうがいいと思います。あとattrがasLowercaseされていないような。HREFが削除されてしまいます。

次のお題向けに整理して書き直しました。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
| string accepts in out upToAnyOf letters separators |
string := '<a title="(>_<;)" href=''www.google.com'' name=''hoge'' target=_blank>link</a> <blink>and</blink> <strong onClick=''alert("NG")''>click<br/>me!</strong>'.

accepts := {#a->#(name href). #strong->#(). #br->#()} as: Dictionary.
string := string copyReplaceAll: '<br>' with: '<br/>'.
in := string readStream.
out := String new writeStream.
upToAnyOf := [:arr | String streamContents: [:ss |
    arr := arr copyWith: nil.
    [arr includes: in peek] whileFalse: [ss nextPut: in next]]].
letters := Character alphabet asArray, Character alphabet asUppercase.
separators := Character separators, #($/ $>).

[out nextPutAll: (in upTo: $<) escapeEntities. in atEnd] whileFalse: [
    | tag lt isClose isAccepted blank rest |
    (isClose := in peek == $/) ifTrue: [in next].
    tag := upToAnyOf value: separators.
    lt := '<', (isClose ifTrue: ['/'] ifFalse: ['']).
    (isAccepted := accepts keys includes: tag asLowercase) ifFalse: [lt := lt escapeEntities].
    out nextPutAll: lt, tag.
    [blank := upToAnyOf value: letters, '>'. {nil. $>} includes: in peek] whileFalse: [
        | attr equal value quote |
        attr := upToAnyOf value: #($= $>).
        equal := in peek == $= ifTrue: [in next asString] ifFalse: [''].
        value := (#($' $") includes: (quote := in peek))
            ifTrue: [quote asString, (in next; upTo: quote), quote asString]
            ifFalse: [upToAnyOf value: #($  $>)].
        out nextPutAll: (isAccepted
            ifFalse: [blank, attr, equal, value escapeEntities]
            ifTrue: [((accepts at: tag) includes: attr)
                ifTrue: [blank, attr, equal, value] ifFalse: ['']])].
    rest := blank, (in peek == $> ifTrue: [in next asString] ifFalse: ['']).
    out nextPutAll: (isAccepted ifTrue: [rest] ifFalse: [rest escapeEntities])].
World findATranscript: nil.
Transcript cr; show: out contents

"=> <a href='www.google.com' name='hoge'>link</a> &lt;blink&gt;and&lt;/blink&gt; <strong>click<br/>me!</strong> "

Index

Feed

Other

Link

Pathtraq

loading...