thesis/ThesTeX/content/2.1-text.tex

102 lines
5.5 KiB
TeX

\section{State of research}
\subsection{Log processing}
System administrators and developers face a daily surge of log files from applications, systems, and servers.
For knowledge extraction, a wide range of tools is in constant development for such environments.
Currently, an architectural approach with three main components is most frequently applied.
This components are divided into aggregation \& creation, storage, and analysis \& frontend.
A popular example is the ELK stack consisting of Elastic Search, Logstash, and Kibana \cite{andreassen2015monitoring,yang2016aggregated,steinegger2016analyse,sanjappa2017analysis}.
In \autoref{tab:logs} some implementations of these components are listed according to the main focus.
For this list, cloud-based services were not taken into account.
A clear classification is not always possible, as some modules integrate virtually all features necessary, as is the case with the Graphite tool set.
\begin{longtable}[H]{cp{0.2\textwidth}p{0.2\textwidth}}
Collection & Database & Frontend\\
\hline
Logstash\footnote{\url{https://www.elastic.co/de/products/logstash}} & Elatisc Search\footnote{\url{https://www.elastic.co/de/products/elasticsearch}} & Kibana\footnote{\url{https://www.elastic.co/de/products/kibana}}\\
Collectd\footnote{\url{https://collectd.org/}} & Influx DB\footnote{\url{https://www.influxdata.com/}} & Grafana\footnote{\url{https://grafana.com}}\\
Icinga\footnote{\url{https://www.icinga.com/products/icinga-2/}} & Whisper\footnote{\url{https://github.com/graphite-project/whisper}} & Graphite\footnote{\url{https://graphiteapp.org/}}\\
StatsD\footnote{\url{https://github.com/etsy/statsd}} & Prometheus\footnote{\url{https://prometheus.io/}} & \\
%\footnote{\url{}} & \footnote{\url{}} & \footnote{\url{}}\\
\caption{Log processing components}
\label{tab:logs}
\end{longtable}
\subsubsection{Collection}
Nearly all services designed for log collection offer multiple interfaces for submitting log data.
By way of illustration, Logstash features a long list of input plugins from streaming files over an HTTP API to proprietary vendor sources like Amazon Web Services (AWS)\footnote{\url{https://www.elastic.co/guide/en/logstash/current/input-plugins.html}}. \nomenclature{\m{A}mazon \m{W}eb \m{S}ervices}{AWS}
Aside from aggreation, the topic of log creation is covered from host-based monitoring solutions like Icinga to application centric approaches with e.g. StatsD embedded in the application source code.
\subsubsection{Databases}
The key component for a log processing system is the storage.
With a focus on chronological events, Time Series Databases (TSDB) are commonly used in these scenarios. \nomenclature{\m{T}ime \m{S}eries \m{D}ata\m{b}ase}{TSDB}
%TODO: weather station screenshot
\subsubsection{Frontend}
Frontends utilize the powerful query languages of the TSDB systems backing them.
Grafana e.g. provides customizable dashboards with graphing and mapping support \cite{komarek2017metric}.
Additional functionality can be added with plugins.
%%%
\begin{itemize}
\item ELK (Elastic search, Logstash, Kibana)\cite{andreassen2015monitoring} \cite{yang2016aggregated} \cite{steinegger2016analyse} \cite{sanjappa2017analysis}
\item Collectd, Influx DB, Grafana \cite{komarek2017metric}
\item
\end{itemize}
\begin{itemize}
\item[+] widely deployed
\item[+] powerful query languages %TODO example
\item mainly web/container/hardware monitoring
\item[-] spatial analysis: heavily anonymized
\item[-] fast-paced environment
\end{itemize}
\subsection{Pedestrian traces}
Analyzing pedestrian movement … based on GPS logs
\begin{itemize}
\item GPS overestimates systematically \cite{Ranacher_2015}
\item GPS is a suitable instrument for spatio-temporal data\cite{van_der_Spek_2009}
\item Activity mining \cite{Gong_2014}
\begin{itemize}
\item Speed-based Clustering \cite{ren2015mining}
%\item \cite{Ferrante_2016} % closed access
\item Machine Learning \cite{pattern_recog} %TODO
\end{itemize}
\item E.g.: Improve tourist management \cite{tourist_analysis2012}
\end{itemize}
\image{.81\textwidth}{../../PresTeX/images/strava}{Heatmap: Fitnesstracker\cite{strava}}{img:strava}
\image{.72\textwidth}{../../PresTeX/images/space-time}{Space-time cube examples\cite{bach2014review}}{img:spacetime}
\image{\textwidth}{../../PresTeX/images/traj-pattern}{Flock and meet trajectory pattern\cite{jeung2011trajectory}}{img:traj-pattern}
\image{\textwidth}{../../PresTeX/images/generalization}{Trajectories and generalizations with varying radius parameter \cite{adrienko2011spatial}}{img:generalization}
\subsection{Analyzing games}
\begin{itemize}
\item there's more than heatmaps
\item combine position with game actions
\item identify patterns, balancing issues
\item manual processes %\citetitle{Drachen2013}\citetitle{AHLQVIST20181}
\end{itemize}
%\image{.5\textwidth}{game-an}{chat logs with players location \cite{Drachen2013}}{img:chatlogs}
%\image{.5\textwidth}{ac3-death}{identify critical sections \cite{Drachen2013}}{img:ac3death}
\twofigures{0.5}{../../PresTeX/images/game-an}{Chat logs with players location}{img:chatlogs}{../../PresTeX/images/ac3-death}{Identify critical sections}{img:ac3death}{Game analytics \cite{Drachen2013}}{fig:gameanal}
\subsection{Summary}
\begin{itemize}
\item Log processing: Powerful stacks
\item Movement analysis: Large field already explored (GPS influence, Patterns, Behavior recognition, …)
\item Track rendering: Track (with attributes), Space-time cube, Heatmap, …
\item Spatial analysis of digital games with GIS
\item Analysis of location based games: Laborious manual process
\end{itemize}