Wednesday, June 29, 2022

Day 15/100 FileGlob

 Day 15/100 FileGlob

Data Slurping on the go


I kind of delaying this function somewhat. It's not as easy as it looks. In fact, the code that I have here is at amateur hobbyist level, instead of professional level. However, it does the job just fine. Of course, you have to verify data integrity since the program is somewhat fragile.


The key point here is that I want to read data all in one go. I realize that I'm reading it per character and that's slow, technically speaking, but it's still fast enough for my use. There are 2 ways for data input: 1. Stdin, which is the Linux pipe. 2. File name on the command line.


I want a program that can handle both, and also, not to have to worry too much about allocating enough memory. I still have to watch out for it, but like I said, it's good enough for me.


Here is the data globbing section. I had some trouble because I thought sizeof() gives me the size of the allocated memory. Unfortunately, that's not true. So, I have to keep track of it manually.


    while ((c=fgetc(fp))!=EOF) {
      Data[DataUsed++]=c;
      if (DataUsed>=DataSize) {
        Data=(char *)realloc(Data,(DataSize+CHUNK+1));
        DataSize+=CHUNK;
        printf("DataUsed DataSize %d %d\n",DataUsed,DataSize);
        if (Data==NULL) exit(1);
      }
    }
    Data[DataUsed]='\0';


And I cap the data at the end with a string terminator NULL character. This is because the data is by lines, otherwise, I will simply use the DataUsed variables to check the size of the Data.


Here is the code that parse the data for lines, and assign them to an array of string pointers:


  printf("ShowLines...");
  Lines[LineUsed++]=&Data[0];
  for (i=0;i<DataUsed;i++) 
    if (Data[i]=='\n') {
      Data[i]='\0';
      Lines[LineUsed++]=&Data[i+1];
      if (LineUsed>=MAXLINE) LineUsed--;
    }
  printf("%d\n",LineUsed);
  for (i=0;i<LineUsed;i++) 
    printf(">>%s\n",Lines[i]);


Since I took a shortcut and neglected to dynamically resize the pointer array, I decided to just ignore the last line. I still count the lines, though, just in case. Alternatively, I can also do that using wc program to check the number of lines.


There's really not much to think about, other than keep track of everything, realloc() memory as needed, and not forgetting to free() it at the end of the program. The use of exit() indicates bad flow, and maybe I should use atexit() to clean up the memory.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//#define CHUNK 65535
#define CHUNK 64
#define MAXLINE 99

char *Data;
int DataSize=0;
int DataUsed=0;
FILE *fp;

char *Lines[MAXLINE];
int  LineUsed=0;

void Init(int argc, char *argv[]) {
  Data=(char *)malloc(CHUNK);\
  DataSize=CHUNK;
}

int ReadData(int argc, char *argv[]) {
  int c;

  for (int i=1;i<argc;i++) {
    puts(argv[i]);
    if (!strcmp(argv[i],"-")) fp=stdin;
    else if ((fp=fopen(argv[i],"r"))==NULL) {
      puts("File open Error");
      return 1;
    }

    while ((c=fgetc(fp))!=EOF) {
      Data[DataUsed++]=c;
      if (DataUsed>=DataSize) {
        Data=(char *)realloc(Data,(DataSize+CHUNK+1));
        DataSize+=CHUNK;
        printf("DataUsed DataSize %d %d\n",DataUsed,DataSize);
        if (Data==NULL) exit(1);
      }
    }
    Data[DataUsed]='\0';

    if (fp!=stdin) fclose(fp);
  }
  return 0;
}

void ProcessData() {
  int i;

  puts(Data);

  printf("ShowLines...");
  Lines[LineUsed++]=&Data[0];
  for (i=0;i<DataUsed;i++) {
    if (Data[i]=='\n') {
      Data[i]='\0';
      Lines[LineUsed++]=&Data[i+1];
      if (LineUsed>=MAXLINE) LineUsed--;
    }
  }
  printf("%d\n",LineUsed);
  for (i=0;i<LineUsed;i++) {
    printf(">>%s\n",Lines[i]);
  }
}

void CleanUp() {
  free(Data);
}

int main (int argc, char *argv[] ) {
  int e=0;

  if (argc<2) {
    puts("fileglob [filename]");
    return 1;
  }

  Init(argc,argv);
  ReadData(argc,argv);
  ProcessData();
  CleanUp();

  return e;
}


One more thing: I did this in a rush, and didn't clean it up. So, think of it as a rough note to be referenced, rather than a polished version.

No comments:

Post a Comment