thesis/ThesTeX/content/5-evaluation.tex

92 lines
5.6 KiB
TeX

\section{Methodology}
BioDiv2Go's Geogame2 (FindeVielfalt Simulation) was base case during the development of the analysis stack.
It was chosen due to its well defined REST API, including log retrieval and user authentication.
This section shows how the framework copes with the integration of another game with completely different architecture and log style.
\subsection{Choosing an additional game}
\autoref{tab:logs2} and \ref{tab:logs3} show an overview of the log files of the different games available.
The game with the highest amount of available log files is Neocartographer.
Neocartographer saves its log files as GPX track.
Additional game states are embedded into the event tag of some of the GPX track-points.
A first overlook yields some GPX files with few bytes, just an GPX header with few Trackpoints and no game actions at all.
However, compared to the other games it has a comprehensible log structure and even with some empty logs there should be a reasonable number of usable game logs.
\begin{longtable}[H]{ccp{0.6\textwidth}}
Geogame & Log files & Notes \\
\hline
BioDiv2Go & $\approx430$ & SQLite database with JSON log entries, references to game config; import base case\\
GeoTicTacToe & $\approx13$ & CSV with pipes; no temporal data; events + tracks\\
\caption{Geogame clients log data}
\label{tab:logs2}
\end{longtable}
\begin{longtable}[H]{ccp{0.6\textwidth}}
Geogame & Log files & Notes \\
\hline
GeoTicTacToe & $\approx2$ & intermediate log format\\
GeoTTT & $\approx130$ & fragmented structure: incomplete or fragmented\\
Neocartographer\furl{http://www.geogames-team.org/?p=23} & $\approx400$ & Partly broken GPX: missing description information; one GPX file per player\\
MissingLink & $\approx6$ & Partly broken GPX: missing spatial information; one GPX file per player\\
Equilibrium\furl{http://www.geogames-team.org/?p=148} & $\approx40$ & GPX with missing end tag\\
\caption{Geogame servers log data}
\label{tab:logs3}
\end{longtable}
The following section \autoref{sec:neocart} describes the integration efforts for Neocartographer.
\section{Integration of Neocartographer}\label{sec:neocart}
\subsection{Neocartographer Game Log Files}
The log files are grouped by folders and contain the GPX tracks and media, mainly photos.%TODO
Many Neocartographer GPX files have invalid XML markup, as \autoref{tab:xml} show.
\begin{longtable}[H]{rl}
Geogame & Log files \\
\hline
missing attribute space & <desc><event message="leaveObject"geoid="9"/></desc>\\
unclosed tag & <desc><event </desc>\\
missing attribute name & <trkpt lat="48.3689110.897709">\\
invalid attribute values & <trkpt lat="UNKNOWN" lon="UNKNOWN">\\
\caption{Neocartographer GPX log error types}
\label{tab:xml}
\end{longtable}
The first two error types (missing separation between two attributes and unclosed tags) are syntactic XML errors.
With the lxml\furl{http://lxml.de/} recovery parser\footnote{\texttt{lxml.etree.XMLParser(recover=True)}} the unclosed tag error is suppressed without further data loss\footnote{With an empty event tag, the data is obviously still missing}.
In the missing attribute separation case, the recovery parser parses only the first attribute properly.
Any additional attributes are stored in the \texttt{tail} field of the XML element's object as raw string.
With string manipulation, the \texttt{geoid} attribute can be restored\footnote{In the data probe, this error occurred only with the \texttt{geoid} attribute}.
The other two errors lead to data corruption, as both cases fail to qualify to valid latitude/longitude pairs.
With the assumption of a two-digit longitude\footnote{The names and other valid longitudes suggest the location of the game field in the eastern part of Bavaria}, the correct value can be restored through string parsing from the offset of the second decimal separator.%TODO
Good practice requires the parser to issue a loud warning to indicate possible errors here.
The last error type occurs with nearly all first and second entries.
They contain the players' \emph{join} and \emph{start} events, when there is no position fix available, yet.
Currently these log entries are discarded with an accompanying log message.
A possible improvement would be the to keep a reference to these entries, and add the first appearing valid location entry.
\subsection{Log Retrieval}
As there is only a playtime server, the files are stored on the file system of the server.
Therefore, an Nginx HTTP server was configured to serve folder indices formatted as JSON (see \autoref{sec:ggt-server}).
This allows the retrieval of the log files in a clean manner by the frameworks loaders.
An additional client implementation in the framework (see \autoref{sec:source}) converts the JSON index to the structure used internally and uses the given functionality to handle file downloads.
\subsection{Analysis Functionality}
Using the \texttt{LocationAnalyzer} in combination with a \texttt{KMLRender} renderer, the analysis of log files was successful on the first run.
\section{Conclusion}
While the implementation of a new client to download log files was straightforward, the parsing of these files proved quite difficult.
However, it was not the integration into the framework but the syntactical errors in the log files that was hard.
While the BioDiv2Go parser requires less than 20 lines of code, the newly written parser scratches the 60 line mark with all the error handling code (see \autoref{code:bd2l} and \ref{code:ncl}).
Once this obstacle is passed, the integration is nearly seamless.
%TODO: webclient
As further challenge proved - like with BioDiv2Go - the understanding of the structure of the log, i.e. deriving the games' internal state machine.