thesis/ThesTeX/content/2.1-text.tex

117 lines
6.7 KiB
TeX

\section{State of research}
%TODO
\subsection{Log processing}
System administrators and developers face a daily surge of log files from applications, systems, and servers.
For knowledge extraction, a wide range of tools is in constant development for such environments.
Currently, an architectural approach with three main components is most frequently applied.
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}.
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
For this list, cloud-based services were not taken into account.
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
Collection & Database & Frontend\\
\hline
Logstash\footnote{\url{https://www.elastic.co/de/products/logstash}} & Elatisc Search\footnote{\url{https://www.elastic.co/de/products/elasticsearch}} & Kibana\footnote{\url{https://www.elastic.co/de/products/kibana}}\\
Collectd\footnote{\url{https://collectd.org/}} & Influx DB\footnote{\url{https://www.influxdata.com/}} & Grafana\footnote{\url{https://grafana.com}}\\
Icinga\footnote{\url{https://www.icinga.com/products/icinga-2/}} & Whisper\footnote{\url{https://github.com/graphite-project/whisper}} & Graphite\footnote{\url{https://graphiteapp.org/}}\\
StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url{https://prometheus.io/}} & \\
%\footnote{\url{}} & \footnote{\url{}} & \footnote{\url{}}\\
\caption{Log processing components}
\label{tab:logs}
\end{longtable}
\subsubsection{Collection}
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\footnote{\url{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS}
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code.
\subsubsection{Databases}
The key component for a log processing system is the storage.
While relational database management systems (RDBMS) \nomenclature{\m{R}elational \m{D}ata\m{b}ase \m{M}anagement \m{S}ystem}{RDBMS} can be suitable for small-scale solutions, the temporal order of events impose many pitfalls.
For instance, django-monit-collector\footnote{\url{https://github.com/nleng/django-monit-collector}} as open alternative to the proprietary MMonit cloud service\footnote{\url{https://mmonit.com/monit/\#mmonit}} assures temporal coherence through lists of timestamps and measurement values stored as JSON strings in a RDBMS. \nomenclature{\m{J}ava\m{s}cript \m{O}bject \m{N}otation}{JSON}
This strategy forces the RDBMS and the application to deal with growing amounts of data, as no temporal selection can be performed by the RDBMS itself.
During the evaluation in \cite{grossmann2017monitoring}, this phenomena rendered the browser-based visualization basically useless and impeded the access with statistical tools significantly.
Time Series Databases (TSDB) are specialized on chronological events.
%TODO
%TODO RRD
With a focus on chronological events, Time Series Databases (TSDB) are commonly used in these scenarios. \nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
\subsubsection{Frontend}
Frontends utilize the powerful query languages of the TSDB systems backing them.
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
Additional functionality can be added with plugins.
%TODO
%TODO: weather station screenshot
%%%
\begin{itemize}
\item ELK (Elastic search, Logstash, Kibana)\cite{andreassen2015monitoring} \cite{yang2016aggregated} \cite{steinegger2016analyse} \cite{sanjappa2017analysis}
\item Collectd, Influx DB, Grafana \cite{komarek2017metric}
\item
\end{itemize}
\begin{itemize}
\item[+] widely deployed
\item[+] powerful query languages %TODO example
\item mainly web/container/hardware monitoring
\item[-] spatial analysis: heavily anonymized
\item[-] fast-paced environment
\end{itemize}
\subsection{Pedestrian traces}
Analyzing pedestrian movement … based on GPS logs
\subsubsection{Data basis: GPS}
\subsubsection{Activity Mining}
\subsubsection{Visualization}
\begin{itemize}
\item GPS overestimates systematically \cite{Ranacher_2015}
\item GPS is a suitable instrument for spatio-temporal data\cite{van_der_Spek_2009}
\item Activity mining \cite{Gong_2014}
\begin{itemize}
\item Speed-based Clustering \cite{ren2015mining}
%\item \cite{Ferrante_2016} % closed access
\item Machine Learning \cite{pattern_recog} %TODO
\end{itemize}
\item E.g.: Improve tourist management \cite{tourist_analysis2012}
\end{itemize}
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
\image{.72\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples\cite{bach2014review}}{img:spacetime}
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern\cite{jeung2011trajectory}}{img:traj-pattern}
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
\subsection{Analyzing games}
\begin{itemize}
\item there's more than heatmaps
\item combine position with game actions
\item identify patterns, balancing issues
\item manual processes %\citetitle{Drachen2013}\citetitle{AHLQVIST20181}
\end{itemize}
%\image{.5\textwidth}{game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
%\image{.5\textwidth}{ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
\subsection{Summary}
\begin{itemize}
\item Log processing: Powerful stacks
\item Movement analysis: Large field already explored (GPS influence, Patterns, Behavior recognition, …)
\item Track rendering: Track (with attributes), Space-time cube, Heatmap, …
\item Spatial analysis of digital games with GIS
\item Analysis of location based games: Laborious manual process
\end{itemize}