Submit your source code via email to dannellys@winthrop.edu with the subject line CSCI325 - HW 03 by 5:00pm on February 11.
Write a program that will sort a large data file. Your program should sort any size data file. You should assume that only half the file will fit into memory.
Your program should read and write data files with the following format:
N record 1 record 2 ... record N-1 record NIn other words, the first bytes of the data file are an integer that indicates the number of records in the file. Note, this header value makes it a lot easier process the file.
The format of each record is:
struct faculty_type { char fname[15],lname[15]; int phone, office; char dept[5]; };
A sample data file of 47 faculty records is available at "/home/ACC.dannellys2/faculty_unsorted.dat". Since that is a data file, not a text file, Dannelly cannot put this sample on the web. You need to copy that file from his linux home directory into your account.
Your program should sort a file of faculty records by last name.
Instead of hardcoding the names of the input and output files like in the first two assignments, your program should read these from the command line (see "code hints" section below). A couple of example runs are below. In the first example the user did not indicate the file to read or the file to write, so the program just output an error message and quit. The second run read the file "faculty_unsort.dat" and created the sorted file "faculty_sorted.dat".
> merge_sort Usage Error: merge_sort inputfile outputfile > merge_sort faculty_unsorted.dat faculty_sorted.dat
As discussed in class, the following is the general algorithm for a 2-way merge sort:
Create 2 sorted files Read 1st half of input file into an array in main memory sort the array write array to file temp1 Read 2nd half of input file into memory sort the array write to file temp2 Merge the 2 files Read record x from temp1 Read record y from temp2 While both temp1 and temp2 contain records if x < y write x to outfile read x from temp1 else write y to outfile read y from temp2 Copy the file that did not empty If temp1 is empty write remainder of temp2 to outfile else write remainder of temp1 to outfile |
To complete this assignment you really need at least two different programs: one that reads an unsorted file and writes a sorted file; and a second program to print the contents of a data file to the screen. Note, a third useful program would be one that can create new data files of this format.
I will share with you my C program to view the contents of files that have this
format. The program expects the name of the file to read as a command line
argument. In other words, run this program like this:
> viewer faculty_unsorted.dat
/*************************************** Program to display to stdout the contents of a file of faculty records. Dannelly - January 2011 ***************************************/ #include <stdio.h> #include <stdlib.h> #include <string.h> // a structure to read and write struct faculty_type { char fname[15],lname[15]; int phone, office; char dept[5]; }; /**************************************/ void main (int argc, char *argv[]) { FILE *infile; struct faculty_type input; int i, count; // error check command line if (argc != 2) { fprintf(stderr, "Error: Usage: %s recordsfile\n",argv[0]); exit (1); } // open data file for reading infile = fopen (argv[1],"r"); if (infile == NULL) { fprintf(stderr, "\nError opening %s for reading\n\n", argv[1]); exit (1); } // read record count from top of the file fread (&count, sizeof(int), 1, infile); // loop throught the file and print all records for (i=0; i<count; i++) { fread (&input, sizeof(struct faculty_type), 1, infile); printf ("%-15s", input.fname); printf ("%-15s", input.lname); printf (" %d ", input.phone); printf (" %d ", input.office); printf (" %s ", input.dept); printf ("\n"); } } |
Don't forget that the first thing in the data file is an integer, not a record. Also, be sure your output file starts with a count, not a record. |
Also, be sure you use read or fread (not >> or fscanf) to read that integer. Your input is a byte file, not a text file. |
To make your debugging easier, I highly recommend that your two temporary files use the same format as the input and output files. That way you can use the viewer program that I provided above. |
A major problem is where to put the file data inside main memory. I suggest an array
because it is easiest to sort arrays. The problem is how big does it need to be?
Array size of 500 will be way too big for the sample data I provide, yet too small for
the data I will use to test your program. Try this C code:
int run_size; // how many records to process struct faculty_type *records = NULL; // a variable-size array of records . . . free(records); records = (struct faculty_type *) malloc (sizeof (struct faculty_type) * run_size); fread (records, sizeof(struct faculty_type), run_size, infile);The free function returns the array back to the operating system for memory clean-up. If free is sent a NULL, then nothing happens. The malloc function does a memory allocation of run_size number of faculty records. The fread then reads run_size number of records into the array named "records". You can now do things like "records[20].phone = 1234;". |
When sorting an array of records, don't forget to swap each element of the structure.
This will not work:
temp = records[j]; records[j] = records[j+1]; records[j+1] = temp; |
Most importantly, consider that is a long and sometimes tricky program. My version was only 178 lines of code, but it took me a few hours. Homework 1 and 2 took me about 20 minutes each. |