shared files, etc.

Shared Files, Temporary Files,
and other fun file manipulations

File Pointers in Linux

Every linux process maintains a table of what files are open, a few simple flags, and a pointer to an external file table. That file table contains a status flag, the current file offset, and a pointer to the i-node information.

The following picture roughly illustrates the File Descriptor Table for two processes. Both processes have one file open. The top process has outfile open for writing and the bottom process has infile open for reading. Of course, both processes also have stdin, stdout, and stderr open. The file tables for stdin, stdout, and stderr are not shown to save a little space in the illustration.

Everything about that picture is editable by you the programmer. You can make a process' file descriptors point to any of that process' other file tables. You can make the file tables of two different processes point to the same i-node.

Note that a process' File Descriptor Table does not really store the name of the file. I just added the file names for clarity.

DUP2

The dup2() function is sometimes handy. It is used to reset pointers in the File Descriptor Table. The format is:
dup2 ( int old, int new )
Note that the parameters are the integer value indexes into the File Descriptor Table.

The following example program creates an output file named temp. After the command "dup2 (3, 1)", all output to file 1 now goes to file 3. In other words, all output to stdout now goes to the file temp. Output to temp also goes to temp.

/********************************************** Steve Dannelly February 2011 Demo the dup2 command. *********************************************/ #include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main () { ofstream outfile; // output file int value; /***** open a file for writing *****/ outfile.open ("temp.out", ios::out); if (!outfile) { cerr << "Error opening output file"; exit (1); } /***** get some input *****/ cout << "Enter an integer : "; cin >> value; /***** redirect stdout into the temp file *****/ dup2 (3,1); // file 1 output now goes to file 3 cout << "User entered " << value << endl; outfile << "Again, user entered " << value << endl; }

This program creates the following output:

> a.out Enter an integer : 47 > cat temp.out User entered 47 Again, user entered 47

Now for the fun part. If we change the program and remove the endl from the cout command, we get a different outcome. The cout command is a request to the operating system to print some stuff. All such requests are buffered. The buffer is cleared when an end-of-line character or similiar event occurs. So, this code:

/***** redirect stdout into the temp file *****/ dup2 (3,1); // file 1 output now goes to file 3 cout << " -- User entered " << value << " -- "; outfile << "Again, user entered " << value << endl; outfile << "Hi mom!!\n";
Yields this output

> a.out Enter an integer : 58 > cat temp.out Again, user entered 58 Hi mom!! -- User entered 58 --

Sharing Files between Processes

Programs frequently need to share information. Pipes and shared memory work well when both programs are running on the same machine. Sockets are a very common way to communicate when both two programs running on different machines (the world wide web uses sockets). When two programs are running on the same machine or different machines on the same file system, then another option for sharing data is Shared Files.

To share a file, two or more programs simply open the file and starting reading or writing. To keep things simple, we will just consider a single Reader and a single Writer.

The big problem with shared files is synchronizing the two processes. How does the Reader know when it is safe to read? The Reader needs to know when there is a new message, AND the Reader should not be reading while the Writer is in the middle of writing a message. Where is the new message in the shared file? yadda yadda If you want to learn more about these types of problems and their solutions, then take CSCI 411 Operating Systems.

Frequently the information shared between two programs is a fixed length message. The Writer can just re-write the message when new data is available. To re-write the message, the Writer could close the file, then reopen for writing. When a file is open for writing, the file pointer starts at the top of the file. A better way to rewrite the message is to rewind the file pointer. (We will discuss resetting the file pointer later in the course.) The Reader can just read the shared message every few seconds and use an if statement to determine when the message has changed.

A more complex problem is when the Writer appends more and more information into the data file. The problem for the Reader is knowing when there is new data to read. If the Reader simply reads till end of file then quits, but before the writer is done Writing, then the Reader will quit without having read all the data. The Reader program must detect that the file has been updated and there is new data to read.

One way for the Reader to know that new data is available is to periodically check the file size. When the size goes up, then there is new data. The following two programs demonstrate such a Reader and Writer. The Reader **must** start first. The Reader creates an empty file that both programs will share. The Reader checks the file size to determine when there is new data. Note that the Reader is intentionally slowed down so that the CPU does not max out constantly checking the file size. The Writer just writes integers as it is fed values by the user. Note that this Reader program is not perfect - if the Writer goes too fast the Reader will get confused and miss some data.

/********************************************** Steve Dannelly Febrary 2011 Shared File - Reader *********************************************/ #include <iostream> #include <fstream> #include <cstdlib> #include <sys/stat.h> using namespace std; int main () { ofstream outfile; // shared file ifstream infile; // shared file struct stat stat_info; // file size info int i, size=0; // loop control int value; // value from file /***** create an empty shared file *****/ outfile.open ("temp.out", ios::out); outfile.close(); /***** open the shared file for reading *****/ infile.open ("temp.out", ios::in); if (!infile) { cerr << "Error opening shared file"; exit (1); } /***** read 10 values from file *****/ for (i=0; i<10; ) { sleep(1); // slow down stat ("temp.out", &stat_info); if (stat_info.st_size != size) // has size changed { infile >> value; // get next value cout << "Read value " << value << endl; size = stat_info.st_size; // remember new size i++; // increment read count } } cout << "*** Reader Program Finiished \n"; }
/********************************************** Steve Dannelly Febrary 2011 Shared File - Writer *********************************************/ #include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main () { ofstream outfile; // output file int i, value; /***** open the common file for writing *****/ outfile.open ("temp.out", ios::out); if (!outfile) { cerr << "Error opening output file"; exit (1); } /***** get ten values and write them to file *****/ for (i=0; i < 10; i++) { cout << "Enter an integer : "; cin >> value; outfile << value << endl; } cout << "Writer Program Finished\n"; }

Here is something interesting to try. Run the above Writer program on two machines at the same time. Notice in the diagram above that both programs have their own file offsets. That means the two Writer programs will be writing on top of each other's messages. However, if the file mode was "app" instead of "out", then the two programs would not write on top of each other. A running Reader could then process values from two Writers at once.

Suppose you want your Reader and Writer, or two Writers, to share a file offset pointer. Thus when any program writes or reads, the other programs' pointers automatically move forward in the file. The way to do this is to use fork(). After fork() creates a clone process, both process share the same File Tables. The diagram shows the pointers of two processes after a fork() has occurred. After the fork, the new child process mutates into a different program using exec(). But, exec() does not reset the File Descriptor Table, so the two processes now share the same file offset pointer. As the two programs write, they update their shared file position. For more information on fork() and exec(), take CSCI 411 Operating Systems.

By now you should get the idea that almost any way that you can think of to share a file between two or more processes can be done in Linux.

Temporary Files

From the Linux man pages:
"When the file's link count becomes 0 and no process has the file open, the space occupied by the file shall be freed and the file shall no longer be accessible. If one or more processes have the file open when the last link is removed, the link shall be removed before unlink() returns, but the removal of the file contents shall be postponed until all references to the file are closed."

In other words, a file is only deleted once its directory entry is removed AND no process' file description table is pointing to it. So, to make absoluting sure that your temporary files are temporary, unlink them right after creating them. If your program terminates normally or abnormally your temporary file will be deleted by the operating system.

The following program creates a temporary file, but delays several seconds before it dies. In the time before it dies, if run the ls program you will NOT see a file named "temp.out".

/********************************************** Steve Dannelly Febrary 2011 Demo the unlink command. *********************************************/ #include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main () { ofstream outfile; int value; /***** open a file for writing *****/ outfile.open ("temp.out", ios::out); if (!outfile) { cerr << "Error opening output file"; exit (1); } unlink ("temp.out"); /***** get some input *****/ cout << "Enter an integer : "; cin >> value; /***** redirect stdout into the temp file *****/ outfile << "User entered " << value << endl; sleep (15); cout << "*** unlink demo is done ***\n\n"; }

Shared Files, Temporary Files, and other fun file manipulations

File Pointers in Linux

DUP2

Sharing Files between Processes

Temporary Files

Shared Files, Temporary Files,
and other fun file manipulations