151 lines
13 KiB
TeX
151 lines
13 KiB
TeX
With an administrative background, the first approach to log processing which comes to mind are the various log processing frameworks.
|
|
\autoref{sec:logproctheo} shows the current state of tools and processes for managing large volumes of log and time series data.
|
|
An overview of the field of pedestrian track analysis is located in \autoref{sec:pedest}.
|
|
Finally, in \autoref{sec:gametheo} the connection of spatial anaylses and digital game optimizations is showcased.
|
|
|
|
\section{Log processing}\label{sec:logproctheo}
|
|
System administrators and developers face a daily surge of log files from applications, systems, and servers.
|
|
For knowledge extraction, a wide range of tools is in constant development for such environments.
|
|
Currently, an architectural approach with three main components is most frequently applied.
|
|
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
|
|
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}. \nomenclature{\m{E}lasticSearch, \m{L}ogstash, and \m{K}ibana}{ELK}
|
|
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
|
|
For this list, cloud-based services were not taken into account.
|
|
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
|
|
|
|
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
|
|
Collection & Database & Frontend\\
|
|
\hline
|
|
Logstash\furl{https://www.elastic.co/de/products/logstash} & Elatisc Search\furl{https://www.elastic.co/de/products/elasticsearch} & Kibana\furl{https://www.elastic.co/de/products/kibana}\\
|
|
Collectd\furl{https://collectd.org/} & Influx DB\furl{https://www.influxdata.com/} & Grafana\furl{https://grafana.com}\\
|
|
Icinga\furl{https://www.icinga.com/products/icinga-2/} & Whisper\furl{https://github.com/graphite-project/whisper} & Graphite\furl{https://graphiteapp.org/}\\
|
|
StatsD\furl{https://github.com/etsy/statsd} & Prometheus\furl{https://prometheus.io/} & \\
|
|
%\furl{} & \furl{} & \furl{}\\
|
|
|
|
\caption{Log processing components}
|
|
\label{tab:logs}
|
|
\end{longtable}
|
|
|
|
\subsection{Collection}
|
|
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
|
|
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\furl{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS} \nomenclature{\m{A}pplication \m{P}rogramming \m{I}nterface}{API}\nomenclature{\m{H}yper\m{t}ext \m{T}ransport \m{P}rotocol}{HTTP}
|
|
|
|
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code\furl{https://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/}.
|
|
|
|
\subsection{Databases}
|
|
The key component for a log processing system is the storage.
|
|
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
|
|
For instance, django-monit-collector\furl{https://github.com/nleng/django-monit-collector} as open alternative to the proprietary MMonit cloud service\furl{https://mmonit.com/monit/\#mmonit} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
|
|
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
|
|
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
|
|
|
|
Time Series Databases (TSDB) are specialized on chronological events.
|
|
One typical use is in monitoring, e.g. server health/usage statistics, or weather stations, like the example \autoref{img:rdd} shows.
|
|
This example utilizes one of the early TSDB systems, RDDtool\furl{https://oss.oetiker.ch/rrdtool/index.en.html}.
|
|
More recently, alternatives written in modern languages are popular, like InfluxDB\furl{https://www.influxdata.com/} on Go\furl{https://golang.org/} or Whisper on Python (from the Graphite software package).
|
|
\image{\textwidth}{mgroth}{Weather station plot with RDDtool \cite{RDD}}{img:rdd}
|
|
\nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
|
|
|
|
\subsection{Frontend}
|
|
|
|
Frontends utilize the powerful query languages of the TSDB systems backing them.
|
|
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
|
|
Additional functionality can be added with plugins, e.g. for new data sources or dashboard panels with visualizations.
|
|
The query languages of the data sources is abstracted by an common user interface.
|
|
|
|
|
|
\section{Pedestrian traces}\label{sec:pedest}
|
|
Analyzing pedestrian movement based on GPS logs is an established technique.
|
|
In the following sections, \autoref{sssec:gps} provides an overview of GPS as data basis, \autoref{sssec:act} highlights some approaches to activity mining and \autoref{sssec:vis} showcases popular visualizations of tempo-spatial data.
|
|
\nomenclature{\m{G}lobal \m{P}ositioning \m{S}ystem}{GPS}
|
|
|
|
\subsection{Data basis: GPS}\label{sssec:gps}
|
|
Global navigation satellite systems (GNSS) like GPS, Galileo, GLONASS, or BeiDou are a source of positioning data for mobile users.
|
|
\nomenclature{\m{G}lobal \m{N}avigation \m{S}atellite \m{S}ystems}{GNSS}
|
|
\cite{van_der_Spek_2009} has shown that such signals provide a reliable service in many situations.
|
|
Additionally, tracks of these signals are a invaluable source of information for researching movements and movement patterns. \cite{Modsching:2008:1098-3058:31,nielsen2004gps,millonig2007monitoring}
|
|
Therefore, GNSS are suitable instruments for acquiring spatio-temporal data \cite{van_der_Spek_2009}.
|
|
|
|
However, \cite{Ranacher_2015} reminds of systematical overestimates by GPS due to interpolation errors.
|
|
To eliminate such biases of one system, \cite{Li2015} describes the combination of multiple GNSS for improved accuracy and reduced convergence time.
|
|
|
|
\subsection{Activity Mining}\label{sssec:act}
|
|
GPS (or GNSS) tracks generally only contain the raw tempo-spatial data (possibly accompanied by metadata like accuracy, visible satellites, etc.).
|
|
Any additional information needs either be logged seperately or needs to be derived from the track data itself.
|
|
This activity mining allows e.g. the determination of the modes of transport used while creating the track \cite{Gong_2014}.
|
|
\cite{Gong_2015} shows the extraction of activity stop locations to identify locations where locomotion suspends for an activity in contrast to stops without activities.
|
|
Informations of this kind are relevant e.g. for improvements for tourist management in popular destinations \cite{tourist_analysis2012,koshak2008analyzing,Modsching:2008:1098-3058:31}.
|
|
|
|
Beside points of interest (POIs), individual behaviour patterns can be mined from tracks, as described in \cite{ren2015mining}.
|
|
Post-processing of these patterns with machine learning enables predictions of future trajectories \cite{10.1007/978-3-642-23199-5_37}.
|
|
%TODO more??
|
|
|
|
\subsection{Visualization}\label{sssec:vis}
|
|
Visualizations help to understand data sets, especially for spatial data.
|
|
|
|
\subsubsection{Heatmap}
|
|
One of the most basic visualization of large amounts of spatial data is the heatmap.
|
|
As the example in \autoref{img:strava} shows, it allows to identify areas with high densities of data points very quickly.
|
|
This comes however with the loss of nearly all context information.
|
|
For example, the temporal information - both the time slice and the relative order of the data points - is completely absent.
|
|
A workaround is an external control element for such information to control the unerlying dataset.
|
|
|
|
\image{\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker \cite{strava}}{img:strava}
|
|
|
|
\subsubsection{Track attributes}
|
|
An example of a rendering methodology including more attributes, \cite{stopher2002gps} details the possibilities using cartographic signatures as seen in \autoref{img:track-attr}.
|
|
When track lines are used, there are some options to indicate attributes of the track, too.
|
|
Besides the color, e.g. the width and stroke-type of the line can indicate certain attributes.
|
|
A combination of these allows the visualization of multiple attributes at once.
|
|
However, such views are limited in the amount of tracks and attributes to display before been confusing and ambiguous.
|
|
|
|
\image{\textwidth}{track-attributes}{Track rendering with acceleration attributes \cite{stopher2002gps}}{img:track-attr}
|
|
|
|
|
|
\subsubsection{Space-time cube}
|
|
One way to address the lack of temporal context is the space-time cube concept reviewed in \cite{kraak2003space}.
|
|
By mapping an additional temporal axis as third dimension on a two-dimensional map, tracks can be rendered in a three-dimensional context.
|
|
The example in \autoref{img:spacetime} shows how such a rendering allows to identify individual movement patterns and locations of activity in between.
|
|
However, it also demonstrates the problems of the difficult interpretation of the 3D map, especially with overlappig tracks.
|
|
Beside from overcrouded ares, many people have difficulties of miss-interpreting the 3D movements.
|
|
The space flattened alternative on the right tries to reduce this problem with a spatial abstraction.
|
|
|
|
\image{\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples \cite{bach2014review}}{img:spacetime}
|
|
|
|
An approach for an time-aware heatmap utilizing space-time cubes is shown in \autoref{img:spacetime2}.
|
|
This highlights hotspots of activity over an temporal axis.
|
|
\image{\textwidth}{space-time-density}{Space-time cube density examples \cite{demvsar2015analysis}}{img:spacetime2}
|
|
|
|
\subsubsection{Trajectory patterns and generalizations}
|
|
To simplify the visualization of large amounts of indiviual tracks, the derivation of patterns applying to the tracks allows to highlight key areas.
|
|
\autoref{img:traj-pattern} shows two examples of such patterns: Flock, where a group of tracks are aligned for some time, and meet, which defines an area of shared presence.
|
|
It is possible to apply such pattern time aware or time agnostic, i.e. whether to take the simultaneous appearance into account. \cite{jeung2011trajectory}
|
|
|
|
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern \cite{jeung2011trajectory}}{img:traj-pattern}
|
|
|
|
An approach for addressing the generalization aspects necessary to visualize massive movement data is described in \cite{adrienko2011spatial}.
|
|
They work on traffic data as shown in \autoref{img:generalization}.
|
|
With an increasing generalization parameter, the flows refine to more abstract representations of travel.
|
|
|
|
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
|
|
|
|
|
|
\section{Analyzing games}\label{sec:gametheo}
|
|
Modern video games with always-on copyprotection or online masterservers allow game studios to collect metrics about players' performances.
|
|
In \cite{Drachen2013}, the authors describe the use of GIS technologies for such environments.
|
|
For example, \autoref{img:chatlogs} shows a correlation between the frequency of certain keywords in the chat messages and the players' current location.
|
|
This indicates a possible bug in the game to look out for.
|
|
|
|
Not only technical problems, design errors or bad balancing can be visualized, too.
|
|
\autoref{img:ac3death} uses a heatmap to highlight areas with high failure rates during playtesting.
|
|
These failure hotspots points can then be addressed for a convienient game flow.
|
|
|
|
\image{\textwidth}{../../PresTeX/images/game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
|
|
\image{\textwidth}{../../PresTeX/images/ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
|
|
%\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
|
|
|
|
In contrast to the complete virtual games above, \cite{AHLQVIST20181} describes the mining of spatial behaviour of players through an real-world base online game.
|
|
With an focus on replicating the real world, players have to align social and natural resources.
|
|
The results of these simulations can then be used to built agent-based simulations with realistic behaviour.
|
|
|