Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

My big idea: Ancient Brain


CA114      CA170

CA668      CA669      Projects

Using files

File - A named section of disk.

Files implementation: Not necessarily a contiguous section of disk (but that fact may be hidden from users and programs).
Normally both user and programmer never deal with disk directly, but only by calling named files.

In some high-performance application (e.g. writing a high-speed search engine), you may need to implement your own file system, but this is obviously difficult and full of dangers.

File Types

File system divisions

Windows file system can spread over multiple pieces of hardware. Each given its own (single-letter) drive:

Can also partition a single piece of hardware into multiple drives.

UNIX file system can spread over multiple pieces of hardware too. But everything appears as sub-directories of a single file hierarchy.
Path may indicate hardware, something equivalent to:

or may hide hardware entirely:

Hierarchical file system

Can organise files in separate dirs (Many web authors seem not to have discovered sub-dirs!).
Crucial to keep user files separate from system files (Why?).
Windows C:\Users\me
Can reuse same file names in different sub-dirs (like index.html).

Long file names

All modern OS's allow long filenames:

Legacy systems:

Short file names are good for ..

Short file names are good, though, for:

  1. File names you type. e.g. If you are typing file names at command-line. All-lower-case is easiest to type.

  2. Program names at the command-line (i.e. the program you call has a short filename). sed, grep, ls, cut, etc. All-lower-case is easiest to type.

  3. Some people say also URLs?
    Maybe you should never type URLs. At most you type the host name that you saw somewhere. For everything else you cut and paste, or click.

    Maybe short URLs: http://en.wikipedia.org/wiki/Othello make the web a more pleasant experience than long URLs:

    It is nice to have short, "guessable" URLs.
    See "URL as UI"
    See URL shortening. (Used e.g. on Twitter.)

Short URLs should probably be used in posters and ads:
This health poster on campus caught my eye.
This probably should use a shorter, and lowercase, URL, like:

Q. Is there still a problem with that URL?

Some web server set-ups generate super-complex URLs, which can then get pasted into documents.
This is apparently a real ad.
From here.


Symbolic link (cross-link, breaking the hierarchy, "shortcut") in UNIX


Can selectively break the hierarchy with shortcuts.

 ln -s dir shortcut
or in Windows see "Create Shortcut"

e.g. on one system I used:

$ ls -l /bin
lrwxrwxrwx   1 root     root           9 Apr 14  1997 /bin -> ./usr/bin


Can also just give a file multiple names:
 ln -s file secondname
e.g. on DCU Linux:

$ ls -l /bin/ls
lrwxrwxrwx 1 root root 11 Apr  8 21:49 /bin/ls -> /usr/bin/ls

$ ls -l /usr/bin/ps
lrwxrwxrwx 1 root root 7 Feb 12 12:14 /usr/bin/ps -> /bin/ps

Q. Why do programs sometimes call a specific path to a program, e.g. they call /bin/ls rather than just ls ?

Can do this on Windows as well (have multiple shortcuts to a data file or program).

Problems with cross-links

With shortcuts, if doing a recursive search of disk, can get infinite loop problems, or at least duplication. e.g. List all files on disk. If follow symbolic links may list files twice.

Q. Also, if delete file, do you delete symbolic link? If so, how do you find them - do you have reverse directory of them? Also, I make symbolic link to other user's file. They delete file. They can't delete my link.
A. If link doesn't work, so what. Might even leave it dangling as reminder.


If your directory is accessible by others on your local machine, someone on your machine can make it readable by the world on the Web (either maliciously or accidentally):

cd     /homes/your-userid/public_html
ln -s  /homes/other-userid/dir          shortcut
The world can then read other user's directory through:
Has valid uses too. Might want to make one of your own dirs visible without having to have it under public_html, e.g. public_html disk is full, dir is on another disk.

Another example - SAMBA or read-write ftp may only drop you in home directory rather than root directory and you may not be able to go upwards. What you do is put symbolic links in your home directory and you can access any directory through them:

  ln -s /var/mail  email
  ln -s /htdocs    ht

"Hierarchy with some cross-links" a very powerful model

General conclusion is that a basic hierarchy, with some cross-links for difficult points, is excellent way to structure complex data (e.g. Open Directory) - rather than total cross-link free-for-all on one hand (e.g. the Web with just search engines and no directories), or rigid hierarchy on other (e.g. Dewey library system).

Interestingly, family trees are also basically hierarchical, with arbitrary cross-links, rather than strictly hierarchical as many people seem to think.

Recycle bin (Windows)

Windows Recycle bin visible through GUI, but also visible as directory through Windows command line:


If it's data (1's and 0's), there's no real excuse for losing it. You can make automated copies and store them all over the world. Disk space is big and cheap. Machines are often idle. The network is always on. Backups can be automated across the network by scripts.

In future, backup and long-term storage will be increasingly important service, like a bank.

  1. Removable media - DVDs, CDs, tapes, USB keys, external hard disk.
  2. Backup to cloud / server. Distributed file system. Network read-write ftp, automated scripts, mirrors.

Other people back you up

Even if you back up nothing, your web pages are being backed up by other people:

Backup policy

  1. Periodically dump entire file system to backup.
  2. Keep a running "mirror", and only backup things that have changed since last time they were synch-ed.
Perhaps only backup user files.
OS, system and application files can be recovered from install CDs / tapes.

Which of these is the most dangerous:

  1. Keep 1 synchronised copy of your files. Backup the changes every night.
  2. Keep 1 synchronised copy of your files. Backup the changes every hour.
  3. Take a copy of all of your files once a week. Keep all these old copies. Do no backups at all during the week.
  4. Take a copy of all of your files once a month. Keep all these old copies. Do no backups at all during the month.
Remember - it may take days or even months before an intrusion and destruction, or accidental damage, is noticed.
User may realise 2 years later that he has deleted some file and needs it back.

ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.