146 lines
11 KiB
TeX
146 lines
11 KiB
TeX
The following chapter \autoref{sec:logproc} takes a dive into the world of log processing frameworks and evaluates the feasibility of two such system for the scope of this thesis.
|
|
Based on the findings, an alternative approach is then outlined in \autoref{sec:alternative-design}.
|
|
|
|
|
|
\section{Evaluating log processing solutions}\label{sec:logproc}
|
|
This chapter looks into the possibilities of existing log processing solutions.
|
|
By example, Kibana with an Elastic Search backend and Grafana with an InfluxDB will be evaluated.
|
|
|
|
\subsection{Evaluating Kibana}
|
|
To evaluate whether Kibana is a viable approach for the given requirements, a test environment was built.
|
|
This setup with Docker, defined with Docker-compose, is documented in \autoref{app:kibana}.
|
|
Two sample data sets were loaded into the Elasticsearch container through HTTP POST requests. %: \texttt{curl -H 'Content-Type: application/x-ndjson' -XPOST 'elastic:9200/\_bulk?pretty' --data-binary @gamelog.json}.
|
|
Once Kibana was told which fields hold the spatial information, it was possible to have a first visualization on the workbench.
|
|
However, this view is optimized for the context of web log processing, so it has a rather low spatial resolution as shown in \autoref{img:kibana} and \autoref{img:kibana2}.
|
|
Dealing mostly with imprecise locations from GeoIP lookups and in respect of the web users` privacy this choice avoids false conclusions\footnote{GeoIP database providers can not always return qualified resolutions, instead rely on default locations, leading to bizarre events like \url{https://splinternews.com/how-an-internet-mapping-glitch-turned-a-random-kansas-f-1793856052}} and enforces privacy-by-default.
|
|
|
|
As an additional restraint to application in the geogame context, the query language restricts the possible research questions the solution can resolve.
|
|
This means only the questions expressible in the query language can be answered.
|
|
Additionally, this requires the users to master the query language before any reasonable conclusions can be extracted.
|
|
|
|
By building a custom plugin, extension, or modified version, it is possible to circumvent this obstacle.
|
|
However, the fast-paced environment of the industry either requires a constant effort of keeping pace, or results in an outdated system rather quickly. (E.g. the next major release Kibana v6.0.0\footnote{\url{https://github.com/elastic/kibana/releases/tag/v6.0.0}} was released about a year after Kibana v5.0.0\footnote{\url{https://github.com/elastic/kibana/releases/tag/v5.0.0}}. However, the previous major version seems to receive updates for about an year, too.)
|
|
\image{\textwidth}{../../PresTeX/images/kibana}{Game trace in Kibana}{img:kibana}
|
|
\image{\textwidth}{../../PresTeX/images/kibana2}{Game trace in Kibana}{img:kibana2}
|
|
|
|
\subsection{Evaluation Grafana}
|
|
Grafana is a solution to analyze, explore, and visualize various source of time series data.
|
|
There exist plugins for nearly any storage and collection backend for metrics\furl{https://grafana.com/plugins?type=datasource}.
|
|
The different backends are available through a unified user interface shown in \autoref{img:grafana}.
|
|
|
|
Spatial resolution suffers under similar conditions compared to Kibana.
|
|
\autoref{img:kibana} shows by example the restrictions by the query language/query editing interfaces in the domain of weather stations.
|
|
|
|
\image{\textwidth}{grafana-metrics}{Configuring a graph in Grafana}{img:grafana}
|
|
|
|
|
|
\subsection{Conclusion}
|
|
This chapter once again instantiates the phrase "spatial is special" \cite{spatialspecial}:
|
|
After all, the monitoring solutions are no perfect match for this special - spatial - use case.
|
|
The privacy concerns vital in web monitoring prohibit detailed spatial analyzes, the query languages can restrict some questions, and custom extensions require constant integration effort.
|
|
|
|
Regarding the specified use cases, especially the non-expert users benefit from a simple to use interface.
|
|
The default Kibana workbench does not qualify for this, a custom interface could improve the situation.
|
|
Grafana does have support for shared dashboards with a fixed set of data, however precise spatial support is still lacking.
|
|
A third party plugin recently does provide such support\furl{https://github.com/CitiLogics/citilogics-geoloop-panel}, unfortunately it missed the timeframe during the evaluation of grafana for this thesis.
|
|
Such a plugin would still be a possibly fragile component given the fast pace of web development shown by these kind of projects.
|
|
|
|
\section{Developing a modular architectural design}\label{sec:alternative-design}
|
|
While the development of a custom stack requires a lot of infrastructural work to get the project running, the learnings above give points to build a custom solution as a feasible alternative:
|
|
\begin{itemize}
|
|
\item Developing from bottom-up takes less time than diving into complex turn-key monitoring solutions.
|
|
\item With rather limited amounts of data\footnote{From a sample of 436 game logs from BioDiv2go, an average log file is 800 kB in size, with a median of 702 kB}, scalable solutions are no hard requirement
|
|
\item No core dependencies on fast-paced projects
|
|
\item Interfaces tailored on requirements: Simple web interface for non-expert users, CLI and API for researchers with unrestricted possibilities.
|
|
\item A focus on key points allows simple, easily extendable interfaces and implementations.
|
|
\item Reducing the complexity to an overseeable level, the processes and results can be verified for accuracy and reliability.
|
|
\end{itemize}
|
|
With the requirements from \autoref{sec:require} and the learnings from log processing evaluations in mind, a modular processing pipeline depicted in \autoref{img:flowchart} allows for a configurable solution.
|
|
It comprises the stages of input, analysis and rendering.
|
|
With interfaces defined between the stages, this approach allows the exchange of single modules without affecting the remaining pipeline.
|
|
\image{.75\textwidth}{flowchart.pdf}{Modular processing pipeline}{img:flowchart}
|
|
|
|
\subsection{Overview}
|
|
An architectural approach surrounding the processing pipeline is visualized in \autoref{img:solution}.
|
|
It outlines three main components of the project: Two user facing services (Web \& CLI / API), and an analysis framework.
|
|
The interfaces (Web and CLI/API) for both target groups (see \autoref{sec:require}) are completely dependent on the analysis framework at the core.
|
|
\image{.75\textwidth}{solution.pdf}{Architecture approach}{img:solution}
|
|
|
|
The following sections describe each of those components.
|
|
\subsection{Analysis Framework}
|
|
|
|
The analysis framework takes game logs, processes their entries, collects results, and renders them to an output.
|
|
With a Map-Reduce pattern as basic structure for the data flow, an ordered collection of analyzing, matching postprocessing and render operations defines an analysis run.
|
|
\autoref{img:flow} shows the data flows through the framework.
|
|
Every processed log file has its own chain of analyzer instances.
|
|
The log entries are fed sequentially into the analysis chain.
|
|
\image{\textwidth}{map-reduce.pdf}{Data flows}{img:flow}
|
|
|
|
\subsubsection{Analyzer}
|
|
An Analyzer takes one log entry at a time and processes it.
|
|
|
|
With dynamic selectors stored in settings, Analyzers can be used on multiple game types.
|
|
For specific needs, Analyzers can tailored to a specific game, too.
|
|
|
|
While processing, the Analyzer can choose to read, manipulate, or consume the log entry.
|
|
\paragraph{Reading a log entry}
|
|
Every Analyzer can read all of the log entry's contents.
|
|
This is obviously the core of the whole framework, as it is the only way to gain knowledge from the log.
|
|
Information can be stored in the Analyzer's instance until the log file was processed completely.
|
|
\paragraph{Manipulating a log entry}
|
|
Every Analyzer can manipulate a log entry.
|
|
This can be adding new information, modifying existing information, or deleting information.
|
|
\paragraph{Consuming a log entry}
|
|
Every Analyzer can consume a log entry.
|
|
A consumed log entry is not passed down the analysis chain anymore.
|
|
This can be useful to filter verbose logs before computationally expensive operations.
|
|
|
|
\subsubsection{Result}
|
|
When all entries of a game log have been processed, the results of each analyzer are collected.
|
|
Each result is linked to the analyzer which has produced this artifact to avoid ambiguous data sets.
|
|
|
|
The results are stored in a ResultStore.
|
|
To support arbitrary structures, a category factory can be specified.
|
|
In this case, special analyzers can introduce categories as needed before storing their result.
|
|
The newly created category will then be used to store consecutive Analyzer results until another category is introduced.
|
|
|
|
\subsubsection{Postprocessing \& Render}
|
|
When all game logs are processed, the whole result store is passed into the postprocessing step.
|
|
This is the first step to compare multiple game logs, i.e. the results of the analyzed game logs, directly.
|
|
|
|
Postprocessing is a hard requirement for rendering the results, as at least a transformation into the desired output format is absolutely necessary.
|
|
Rendering is not restricted to visualizations, artifacts of all kind can be produced.
|
|
A whole range from static plots and CSV exports to structured JSON data for interactive map visualizations or text generation is possible.
|
|
|
|
\subsubsection{Log parser}
|
|
Key to the framework described above is a component to import game log data, parse it, and prepare it to a common format for processing.
|
|
|
|
This needs to be adapted for each supported game.
|
|
It has to know where game logs are stored and how they can be accessed.
|
|
Configurable items like URLs and user credentials allow e.g. for different game servers.
|
|
The important step is the parsing of game logs from the formats used by the games (e.g. JSON, XML, plain text, database, …) to a common format used internally.
|
|
|
|
|
|
\subsection{Web Interface}
|
|
The web interface is rather straightforward:
|
|
Expert users prepare a set of analysis methods and bundle them with suitable rendering targets to an analysis suite.
|
|
Non-expert users select game logs for processing, choose a prepared analysis suit, and receive a rendered result once the analysis process has finished.
|
|
|
|
\subsection{CLI/API Interface}
|
|
Providing direct access to analysis and render classes, the CLI/API interface offers the most powerful way to explore the log data.
|
|
By implementing custom algorithms, expert users can cope with difficult input formats and special requirements.
|
|
Splitting often used analysis functionality into small, universal Analyzers, compositing Analyzers into a queue may be sufficient to achieve some information desires.
|
|
|
|
|
|
\subsection{Architecture}
|
|
|
|
The API is designed to be standalone, i.e. it is independent of both game servers and user interfaces.
|
|
|
|
Separation from game servers narrows the scope, and allows the usage with any kind of game.
|
|
Games without central server can provide a mocked server to supply logged data, while games with server can e.g. expose API endpoints with authentication and user management.
|
|
By acting like any normal client, the framework can avoid obstacles like CORS/XSS prevention.
|
|
|
|
|
|
The independence to user interfaces, mainly the web interface, allows scalability through load-balancing with multiple API workers.
|
|
Expert users with special requirements can embed the framework in projects without pulling in large amounts of dependencies for user interfaces or games/game servers.
|