
 
Rules outperformed our method. Outstanding 
performance in the “next action prediction” task and 
average results in anomaly detection mean that 
proposed method very precisely guesses the most 
expected action, but not enough accurately estimates 
the set of all expected actions (that Seq-EM and A-
Rules do). It means that the mechanism of 
probabilities estimation used in the decision tree 
algorithm (9) is not perfect for the anomaly 
detection task. In the future research we will check 
the anomaly detection ability of the proposed 
approach with other probabilistic multi-class 
classification algorithms, e.g. with kernel methods 
(Hastie, 2001), and we hope to obtain outperforming 
results in this scenario as well. 
5 CONCLUSIONS 
The main contributions of this paper can be 
summarized as following: 
1. New type of data source for user behavior 
modeling has been considered. This is the database 
access log consisting of traces of SQL queries 
executed by users. It is promising information 
source because the major part of modern software 
systems use relational databases as information 
storage, and usually all critical user actions leave a 
trace in database access logs. 
2. Simple but effective procedure for translating 
SQL traces structures into a finite alphabet of 
symbols has been proposed. It allows analyzing 
database access log data with traditional data mining 
techniques such as sequential mining and association 
rules mining methods.  
3. Novel method for mining probabilistic user 
behavior models has been formulated. Unlike other 
existing data mining methods it incorporates time 
feature in the user model. The empirical feature 
map, motivated by potential functions theory, has 
been proposed for that. Combining this feature map 
with decision tree algorithm we obtain new method 
with following advantages: it is precise enough; it 
takes into account time intervals between user 
actions; it gives understandable for a human expert 
interpretation of generated behavior models in the 
form of “IF…THEN” rules. 
4. Experimental performance evaluation on real-
world data has been conducted. It has demonstrated 
that database access logs can be successfully used 
for user behavior modeling and reliable models can 
be constructed. In these experiments, our proposed 
method has demonstrated outstanding results in the 
“next action prediction” scenario and competitive 
results in “anomaly detection” scenario. 
ACKNOWLEDGEMENTS 
This research is supported by grant of RFFI (Russian 
Foundation for Basic Research) # 05-01-00744 and 
by grant of the President of Russian Federation MK-
2111.2005.9. 
REFERENCES 
Aizerman, M.A., Braverman, E.M., & Rozonoer, L.I., 
(1970). Method of Potential Functions in the Theory of 
Learning Machines. Nauka, Moscow (in Russian). 
Dan, P., Yu, S. & Chung, J.-Y. (1995). Characterization 
of database access pattern for analytic prediction of 
buffer hit probability. VLDB J., 4(1):127--154. 
Debar, H., Becke, M. & Siboni, D. (1992). A neural 
network component for an intrusion detection system. 
In IEEE Symp. on Security and Privacy, pp. 240--250. 
Ghosh, A., Schwartzbard, A. & Schatz, M. (1999). 
Learning Program Behavior for Intrusion Detection. In 
1th USENIX Workshop on Intrusion Detection and 
Network Monitoring. Florida, CA. 
Hastie, T. (2001). The Elements of Statistical Learning, 
Springer, New York. 
Lee, W. & Stolfo, S. (1998). Data mining approaches for 
intrusion detection. In 7th USENIX Security 
Symposium (SECURITY'98). 
Liu, B., Hsu, W. & Ma, Y. (1998). Integrating 
classification and association rule mining. In 4th Int. 
Conf. on KDD and Data Mining, pages 80–96. 
Manavoglu, E., Pavlov, D. & Giles, C. (2003). 
Probabilistic User Behavior Models. In IEEE Int. 
Conf. on Data Mining (ICDM-03). Melbourne, FL. 
Maxion, R. & Roberts, R. (2004). Proper Use of ROC 
Curves in Intrusion/Anomaly Detection, Tech. report 
CS-TR-871, University of Newcastle upon Tyne. 
Piatetsky-Shapiro, G., Fayyad, U., Smyth, P. & 
Uthurusamy, R. (1996). Advances in Knowledge 
Discovery and Data Mining, AAAI Press/MIT Press. 
Quinlan, J. (1987). Generating production rules from 
decision trees. In 10th International Joint Conference 
on Artificial Intelligence, pp. 304--307. 
Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. (2001). 
Item-based Collaborative Filtering Recommendation 
Algorithms. In 10th International World Wide Web 
Conference, pp. 285-295 
Tang, Z.-H. & MacLennan, J. (2005). Data Mining with 
SQL Server 2005, Wiley Publishing. 
Valeur, F., Mutz, D. & Vigna, G. (2005). A Learning-
Based Approach to the Detection of SQL Attacks. In 
IEEE Conf. on Detection of Intrusions and Malware & 
Vulnerability Assessment, pp. 123-140. 
 
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
78