117 lines
6.7 KiB
TeX
117 lines
6.7 KiB
TeX
\section{State of research}
|
|
|
|
%TODO
|
|
\subsection{Log processing}
|
|
System administrators and developers face a daily surge of log files from applications, systems, and servers.
|
|
For knowledge extraction, a wide range of tools is in constant development for such environments.
|
|
Currently, an architectural approach with three main components is most frequently applied.
|
|
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
|
|
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}.
|
|
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
|
|
For this list, cloud-based services were not taken into account.
|
|
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
|
|
|
|
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
|
|
Collection & Database & Frontend\\
|
|
\hline
|
|
Logstash\footnote{\url{https://www.elastic.co/de/products/logstash}} & Elatisc Search\footnote{\url{https://www.elastic.co/de/products/elasticsearch}} & Kibana\footnote{\url{https://www.elastic.co/de/products/kibana}}\\
|
|
Collectd\footnote{\url{https://collectd.org/}} & Influx DB\footnote{\url{https://www.influxdata.com/}} & Grafana\footnote{\url{https://grafana.com}}\\
|
|
Icinga\footnote{\url{https://www.icinga.com/products/icinga-2/}} & Whisper\footnote{\url{https://github.com/graphite-project/whisper}} & Graphite\footnote{\url{https://graphiteapp.org/}}\\
|
|
StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url{https://prometheus.io/}} & \\
|
|
%\footnote{\url{}} & \footnote{\url{}} & \footnote{\url{}}\\
|
|
|
|
\caption{Log processing components}
|
|
\label{tab:logs}
|
|
\end{longtable}
|
|
|
|
\subsubsection{Collection}
|
|
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
|
|
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\footnote{\url{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS}
|
|
|
|
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code.
|
|
|
|
\subsubsection{Databases}
|
|
The key component for a log processing system is the storage.
|
|
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
|
|
For instance, django-monit-collector\footnote{\url{https://github.com/nleng/django-monit-collector}} as open alternative to the proprietary MMonit cloud service\footnote{\url{https://mmonit.com/monit/\#mmonit}} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
|
|
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
|
|
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
|
|
|
|
Time Series Databases (TSDB) are specialized on chronological events.
|
|
%TODO
|
|
%TODO RRD
|
|
With a focus on chronological events, Time Series Databases (TSDB) are commonly used in these scenarios. \nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
|
|
|
|
|
|
\subsubsection{Frontend}
|
|
|
|
Frontends utilize the powerful query languages of the TSDB systems backing them.
|
|
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
|
|
Additional functionality can be added with plugins.
|
|
%TODO
|
|
|
|
%TODO: weather station screenshot
|
|
|
|
%%%
|
|
\begin{itemize}
|
|
\item ELK (Elastic search, Logstash, Kibana)\cite{andreassen2015monitoring} \cite{yang2016aggregated} \cite{steinegger2016analyse} \cite{sanjappa2017analysis}
|
|
\item Collectd, Influx DB, Grafana \cite{komarek2017metric}
|
|
\item …
|
|
\end{itemize}
|
|
\begin{itemize}
|
|
\item[+] widely deployed
|
|
\item[+] powerful query languages %TODO example
|
|
\item mainly web/container/hardware monitoring
|
|
\item[-] spatial analysis: heavily anonymized
|
|
\item[-] fast-paced environment
|
|
\end{itemize}
|
|
|
|
\subsection{Pedestrian traces}
|
|
Analyzing pedestrian movement … based on GPS logs
|
|
|
|
\subsubsection{Data basis: GPS}
|
|
\subsubsection{Activity Mining}
|
|
\subsubsection{Visualization}
|
|
\begin{itemize}
|
|
\item GPS overestimates systematically \cite{Ranacher_2015}
|
|
\item GPS is a suitable instrument for spatio-temporal data\cite{van_der_Spek_2009}
|
|
\item Activity mining \cite{Gong_2014}
|
|
\begin{itemize}
|
|
\item Speed-based Clustering \cite{ren2015mining}
|
|
%\item \cite{Ferrante_2016} % closed access
|
|
\item Machine Learning \cite{pattern_recog} %TODO
|
|
\end{itemize}
|
|
\item E.g.: Improve tourist management \cite{tourist_analysis2012}
|
|
\end{itemize}
|
|
|
|
|
|
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
|
|
|
|
\image{.72\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples\cite{bach2014review}}{img:spacetime}
|
|
|
|
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern\cite{jeung2011trajectory}}{img:traj-pattern}
|
|
|
|
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
|
|
|
|
|
|
\subsection{Analyzing games}
|
|
\begin{itemize}
|
|
\item there's more than heatmaps
|
|
\item combine position with game actions
|
|
\item identify patterns, balancing issues
|
|
\item manual processes %\citetitle{Drachen2013}\citetitle{AHLQVIST20181}
|
|
\end{itemize}
|
|
%\image{.5\textwidth}{game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
|
|
%\image{.5\textwidth}{ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
|
|
\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
|
|
|
|
|
|
|
|
\subsection{Summary}
|
|
\begin{itemize}
|
|
\item Log processing: Powerful stacks
|
|
\item Movement analysis: Large field already explored (GPS influence, Patterns, Behavior recognition, …)
|
|
\item Track rendering: Track (with attributes), Space-time cube, Heatmap, …
|
|
\item Spatial analysis of digital games with GIS
|
|
\item Analysis of location based games: Laborious manual process
|
|
\end{itemize} |