一部のHTMLタグを通すフィルタ
Posted feedbacks - Smalltalk
Squeak Smalltalk で。
例によって正規表現が使えないので手続き的に。この調子だと、より複雑なことが要求される続編が思いやられます…(^_^;)。
例によって正規表現が使えないので手続き的に。この調子だと、より複雑なことが要求される続編が思いやられます…(^_^;)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | | string in tag out save rest |
string := '<a href=''www.google.com''>link</a> <blink>and</blink> <strong onClick=''alert("NG")''>click<br/>me!</strong>'.
in := string readStream.
out := String new writeStream.
[in atEnd] whileFalse: [
out nextPutAll: (in upTo: $<).
in back.
save := in position.
tag := in upTo: Character space.
(tag includes: $/) ifTrue: [in position: save. tag := in upTo: $>. in back].
out nextPutAll: ((#('<a' '<br/' '<strong' '</a' '</strong') includes: tag asLowercase)
ifTrue: [tag] ifFalse: [tag := '<', tag allButFirst]).
tag := tag asLowercase.
[save := in position. (rest := in upTo: $>) includes: $=] whileTrue: [
| attr quote data |
in position: save.
attr := in upTo: $=.
quote := (#($' $") includes: in peek) ifTrue: [in next] ifFalse: [Character space].
data := in upTo: quote.
quote := quote = Character space ifTrue: [''] ifFalse: [quote asString].
data := attr, '=', quote, data, quote.
in skipSeparators.
(tag = '<a' and: [#(href name) includes: attr]) ifTrue: [out space; nextPutAll: data].
(#('<a' '<br/' '<strong') includes: tag) ifFalse: [
out space.
data do: [:chr | chr = $< ifTrue: [out nextPutAll: '<'] ifFalse: [out nextPut: chr]]]].
out nextPutAll: rest, '>'].
^out contents
"=> '<a href=''www.google.com''>link</a> <blink>and</blink> <strong>click<br/>me!</strong>' "
|
次のお題向けに整理して書き直しました。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | | string accepts in out upToAnyOf letters separators |
string := '<a title="(>_<;)" href=''www.google.com'' name=''hoge'' target=_blank>link</a> <blink>and</blink> <strong onClick=''alert("NG")''>click<br/>me!</strong>'.
accepts := {#a->#(name href). #strong->#(). #br->#()} as: Dictionary.
string := string copyReplaceAll: '<br>' with: '<br/>'.
in := string readStream.
out := String new writeStream.
upToAnyOf := [:arr | String streamContents: [:ss |
arr := arr copyWith: nil.
[arr includes: in peek] whileFalse: [ss nextPut: in next]]].
letters := Character alphabet asArray, Character alphabet asUppercase.
separators := Character separators, #($/ $>).
[out nextPutAll: (in upTo: $<) escapeEntities. in atEnd] whileFalse: [
| tag lt isClose isAccepted blank rest |
(isClose := in peek == $/) ifTrue: [in next].
tag := upToAnyOf value: separators.
lt := '<', (isClose ifTrue: ['/'] ifFalse: ['']).
(isAccepted := accepts keys includes: tag asLowercase) ifFalse: [lt := lt escapeEntities].
out nextPutAll: lt, tag.
[blank := upToAnyOf value: letters, '>'. {nil. $>} includes: in peek] whileFalse: [
| attr equal value quote |
attr := upToAnyOf value: #($= $>).
equal := in peek == $= ifTrue: [in next asString] ifFalse: [''].
value := (#($' $") includes: (quote := in peek))
ifTrue: [quote asString, (in next; upTo: quote), quote asString]
ifFalse: [upToAnyOf value: #($ $>)].
out nextPutAll: (isAccepted
ifFalse: [blank, attr, equal, value escapeEntities]
ifTrue: [((accepts at: tag) includes: attr)
ifTrue: [blank, attr, equal, value] ifFalse: ['']])].
rest := blank, (in peek == $> ifTrue: [in next asString] ifFalse: ['']).
out nextPutAll: (isAccepted ifTrue: [rest] ifFalse: [rest escapeEntities])].
World findATranscript: nil.
Transcript cr; show: out contents
"=> <a href='www.google.com' name='hoge'>link</a> <blink>and</blink> <strong>click<br/>me!</strong> "
|




にしお
#3410()
Rating0/0=0.00
このお題はperezvonさんの提案を元にしています。ありがとうございました。 ただ、いきなりだと難しいかと思ったので、肝の部分以外を先に出題しました。このお題は続編で徐々に難しくなっていきます。
追記:属性に<や>が含まれてしまうケースに漏れのある解答が多いようなのでテストケースを追加します。 これは「この出力なら十分」という意味です。この出力の通りでなければいけないという意味ではありません。 <script foo="<script>alert('bar')</script>">alert('foo')</script> <script foo="<script>alert('bar')</script>">alert('foo')</script> <script foo="<a href='link'>link</a>">alert('foo')</script> <script foo="<a href='link'>link</a>">alert('foo')</script> <a href='www.g>oogle.com'>link</a> <a href="./www.g%3Eoogle.com">link</a>[ reply ]