algorithm - How can I parse this plaintext RP-style string into a more generic XML-style one? -


i'm making app translate roleplaying-style messages more generic. user has ability specify preferences, like:

moves  - /me <move>  - *<move>* speech  - <speech>  - "<speech>" out-of-character  - [<ooc>]  - ((ooc))  - //ooc 

i need parse message this:

/me eats food "this *munch* good!" [you're @ this] 

or this:

*eats food* *munch* good! ((you're @ this)) 

into more generic, xml-like string this:

<move>eats food <speech>this <move>munch</move> good!</speech> <ooc>you're @ this</ooc></move> 

but regard inside which. example:

*eats food "this munch* good" // you're @ 

should parsed as:

<move>eats food "this munch</move><speech> good" </speech><ooc> you're @ this</ooc> 

even if that's not user intended. note quotes in last example weren't parsed because didn't wrap complete segment, , current move segment had not finished time first encountered, , speech had started when second 1 was, , second 1 didn't have after surround separate speech segment.

i've tried doing iteratively, recursively, trees, , regexes, haven't found solution works want to. how parse above rp-style messages above generic xml-style messages?

also important spacing preserved.

here other examples using above-listed preferences:

i roller coasters. [what like?] /me eats hamburger // wanna grab lunch after this? *jumps , down* ((the party)) great! /me performs *an action* within action "and that's fine [as *an action* in ooc in speech]" , messages /me can change contexts // @ point [but ill-formatted ones *must parsed] according "to* rules" -and text formatted in <non-specified ways> &not treated; specially- 

become:

<speech>i roller coasters.</speech> <ooc>what like?</ooc> <move>eats hamburger <ooc> wanna grab lunch after this?</ooc></move> <move>jumps , down</move><speech> <ooc>the party</ooc> great!</speech> <move>performs <move>an action</move> within action <speech>and that's fine <ooc>as <move>an action</move> in ooc in speech</ooc></speech></move> <speech>and messages <move>can change contexts <ooc> @ point</ooc></move></speech> <ooc>but ill-formatted ones *must parsed</ooc><speech> according <speech>to* rules</speech></speech> <speech>-and text formatted in &lt;non-specified ways&gt; &amp;not treated; specially-</speech> 

what have bunch of tokens should trigger xml tag. straightforward implement using function each tag.

void move(){      xmlprintwriter.println("<move>");      parse();      xmlprintwriter.println(content);      xmlprintwriter.println("</move>"); } 

where parse() consumes , classifies input text.

 void parse(){      if (text.startswith("*")) action = move;      ... other cases       if ( action == move){          move();      }      ... other actions. 

the parse method has check possible state-changers "*" -> move, "((" -> ooc, """ -> speech , on.

here move class constant, action state variable along text , xmlprintwriter. move , parse both methods

this approach not work though if allow last example. situation becomes extremely hairy , need decided on case case basis.


Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -