Ask Question, Ask an Expert


Ask Computer Engineering Expert

Project Description:

In this project, you will implement a rudimentary intrusion detection system in the kernel. Attacks on computers are a very significant problem today, and likely to grow worse in the future. Intrusion Detection Systems (IDSs) seek to detect attacks. One approach to detecting attacks, inspired by biology, seeks to characterize the normal behavior of the process and then measure deviations from it, which would indicate an attack.

Homeostasis is the physiological process by which the internal systems of the human body are maintained within normal ranges despite variations in operating conditions. The body achieves homeostasis, for ex, through its reactions to temperature variations, physical and psychological stresses, and microbial invaders. In this project we will try to simulate the human behavior of homeostasis in the computer system called Process Homeostasis (pH) [1] [2] [3] [4] that will track abnormalities.

There are several different ways of doing this. One interesting approach keeps track of system calls. In particular, the idea is that when someone "rootkits" a computer or subverts an existing process using say a buffer overflow, this can be detected by looking at the system calls the process makes and comparing it to the calls that a "healthy" process would make. A healthy process will typically exhibit a small set of behaviors. Each of these behaviors can be characterized by a "usual" sequence of system calls. A naïve way to do this is to see if a system call was made by a process that it does not make under normal conditions.

More interestingly, we could build a "reference log" of normal sequences for the process. Given a stream of system calls made by the process, a sequence is a group of system calls in the stream as seen from a fixed sezed window. Consider a sample stream of system calls: open(O), read(R), mmap(M), prepare(W), close(C).


This stream as seen from window size 5 gives following sequences.


If we could construct a reference log for the particular environment, process behavior can be monitored for aberrations. In particular, a sequence that is not in the reference log indicates an aberration. Experimental evidence suggests that short sequences of system calls (of length six or more) provide a signature for normal behavior and that signature has a high probability of being perturbed during intrusions.

In this project, we will implement a simpler scheme. You will instrument the interrupt dispatcher so that each time a system call is made, you will log the pid of the calling process and the actual call made. prepare a separate user space process that will monitor this log and compare it with the reference logs of each process. (Construct a reference log by observing the process in healthy state. There are various ways to store the logs e.g. flat file, database etc.). The user space process will look at the last 'k' entries in the log. If a particular system call is made in this window, you will set the bit corresponding to that call to '1'. You must decide value of 'k' that you will use.

Remember that a healthy process will not exhibit exactly the same behavior as in your reference log i.e. the exact system calls made will vary slightly. So in order to avoid false alarms, we must decide if the current sequence is aberrant enough to be classified as intrusion. Thus we need to define "distance" between two sequences. If the distance between the observed sequence and sequence in the log crosses some predefined threshold, then we can flag an intrusion. One simple way to define distance between two sequences is Hamming Distance. The Hamming distance between two ordered sequences is the number of places where the sequences are not identical. e.g. distance between 1010 and 0101 is 4 while distance between 1011 and 1101 is 2. For our purpose, consider a reference log where open, read, prepare, mmap and close (1, 1, 1, 1, 1) and observed log with open, read and close (1, 1, 0, 0, 1). The hamming distance between these two logs is 2.

Rules for Collaboration:

You are allowed to discuss ideas your classmates. No code exchange is allowed. If you use any external sources (like internet sources) you must cite them. Again, academic dishonesty will be sternly dealt with. If in doubt, contact a TA or instructor for clarification.

Helpful Hints:

• You may decide to log the calls from all the processes and then let the user program to filter out the system calls made by other processes or you can only log the system calls made by some specific process. Remember that you are not allowed to "hard code" the pid. Do not hard code the name of the log file either.

• There are a few ways to log the system calls. There is a simple mechanism called ptrace which allows tracing the system calls. You are NOT allowed to use it. This project must be in the kernel. We list couple of ways to do so. If you want to do something else, you must verify with your instructor before proceeding.

• You can implement a loadable kernel module that replaces the function pointer for interrupt 0x80 in the Interrupt Descriptor Table with your own function pointer. (For linux on i386 architecture, system calls are invoked by a software instruction which generates interrupt 0x80 or through the sysenter mechanism) The normal system call handler function is defined in the file entry_32.S with the label ENTRY(system_call). The replacement interrupt handler will do the logging and then call the original function. To perform the logging, the module must be implemented as a character device that allocates a buffer (use kmalloc() and kfree()), and use ioctl() to send the stream to a user space program. The user program does the logging. With this approach, do NOT monitor the system calls made by all the processes. (Why?)

• Another way is to modify the system call handler itself. Whenever a system call is made through int 0x80, thesystem_call function defined in entry_32.S is called. The eax register has the system call number and it is used to index into the system call table and call the function that implements the system call. Modify the system_call function to call your logging function. Remember not to use fopen(), fread(), fprepare() functions (or open() and friends) to do the logging. An ex of reading a file in a kernel module or inside the kernel is provided under "Course Documents" on Blackboard. You have to figure out the exact details about "writing" the log file (including the way that you will format it).

• When plugging in your code inside assembly code, don't forget to save the registers before you make any changes and to restore them after you are done.

• Linux Cross-Referencing project is a very good place to browse the kernel sources.

• Do not use the mechanisms used to copy user space buffers to kernel buffers when copying data from kernel space buffer to another kernel space buffer.

• Regardless of the approach you take, you may wish to disable sysentersyscall support (or int 0x80syscall support) in your kernel to make your life a little bit easier. That way, you only have to instrument one syscall entry sequence instead of two. This can be done by modifying the Virtual Dynamic Shared Object (VDSO) code in the Linux kernel. If you do not do so, and only instrument the int 0x80 path, you will probably not get any results at all.

Theoretical References:

1) Operating System Stability and Security through Process Homeostasis, Anil B. Somayaji. Ph.D. thesis, University of New Mexico, July 2002., pH: process Homeostasis:

2) Hajime Inoue et al Anomaly intrusion detection in dynamic execution environments, Proceedings of the 2002 workshop on New security paradigm, 2002, ISBN:1-58113-598-X,  University of New Mexico, Albuquerque

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91731

Have any Question? 

Related Questions in Computer Engineering

Create a sql server login named wpc-ch10-user with a

Create a SQL Server login named WPC-CH10-User, with a password of WPC-CH10- User+password. Create a WPC-CH10-PQ database user named WPC-CH10-DatabaseUser, which is linked to the WPC-CH10-User login. Assign WPC-CH10-Datab ...

Implementing network awareness explain exactly what happens

Implementing network awareness. Explain exactly what happens in the network (what messages are sent and when) during the execution of the distributed lexical scoping example given in section 11.4. Base your explanation o ...

An itemset x is called a generator on a data set d if there

An itemset X is called a generator on a data set D if there does not exist a proper sub-itemset Y ⊂ X such that support(X) = support(Y). A generator X is a frequent generator if support(X) passes the minimum support thre ...

Game timeyou have a little free time on your hands and

Game Time You have a little free time on your hands and decide to create a simple game. Utilize the design tools that you have learned this week to design and program a very simple game. Examples: Movie Trivia, Guess a n ...

Consider the traffic citation shown in figure 5-54 the

Consider the traffic citation shown in Figure 5-54. The rounded corners on this form provide graphical hints about the boundaries of the entities represented. A. Create a data model with five entities. Use the data items ...

Following the example of sensor data from this section

Following the example of sensor data from this section, suppose that the following temperature-time readings are generated by sensor 100, and each arrives at the DSMS at the time generated: (80,0), (70,50), (60,70), (65, ...

1 explain why the enrollment in colleges and universities

1. Explain why the enrollment in colleges and universities increases at times of economic recession.  Make a distinction between explicit and implicit costs in your explanation 2. Use the following production possibiliti ...

How does recorded data contribute to the problem-solving

How does recorded data contribute to the problem-solving process and to continuous improvement of products and service delivery?

What is the numeric range of a 16-bit twos complement value

What is the numeric range of a 16-bit two's complement value? A 16-bit excess notation value? A 16-bit unsigned binary value?

A compare the difficulty of writing sub queries and joins

a. Compare the difficulty of writing sub queries and joins with the difficulty of dealing with anomalies caused by multivalued dependencies. b. Describe three uses for a read-only database.

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

A cola-dispensing machine is set to dispense 9 ounces of

A cola-dispensing machine is set to dispense 9 ounces of cola per cup, with a standard deviation of 1.0 ounce. The manuf

What is marketingbullwhat is marketing think back to your

What is Marketing? • "What is marketing"? Think back to your impressions before you started this class versus how you

Question -your client david smith runs a small it

QUESTION - Your client, David Smith runs a small IT consulting business specialising in computer software and techno

Inspection of a random sample of 22 aircraft showed that 15

Inspection of a random sample of 22 aircraft showed that 15 needed repairs to fix a wiring problem that might compromise

Effective hrmquestionhow can an effective hrm system help

Effective HRM Question How can an effective HRM system help facilitate the achievement of an organization's strate