MetaSource_DATE_DataPoints for genealogy & other microhistory projects, a pathway, directory/folder and file naming system
2017/03/11 § 1 Comment
I am of an age to begin genealogical projects. While younger the interest was the same as it is now, but not to begin, even if then there was more time, or because there was more time.
Of course, in the course of time people die, data disppears. And contra, in modern times technology progresses, access to archives of data get easier. Swings and roundabouts.
About seven years ago I started with some genealogy software and learnt about GEDCOM files. Played around a bit with various applications. Looked at some inherited documents. Scanned them. Saved them. Went online, search some strings from 1980s typewritten family trees, and discovered ancestors on Monaropioneers.com. Learned some ancestors were actually born in New Zealand.
And then it was seven years later. Somehow.
At some time in those seven years I had discovered GRAMPS, opensource genealogy software, but had never actually used it. So a month ago I started using GRAMPS and looked at my mess of digital files, scans of old letters and documents, and downloads. Where to begin?
But I did begin.
Apparently some of my decisions were the usual ones for beginners. By family, by person, by official docs, by correspondence, by photo. Not mistakes but on going onto the GRAMPS email group and sharing my directory/folder structure, someone went yeahbut.
They mentioned the French way of archiving. And so I went off learned about Respect des fonds, Original order, Provenienzprinzip and so on. Now I already knew about Provenance and how to cite so my file naming was okay, I thought, so that didn’t change much, what did change was my folder structure, and the big picture.
And how file name and their directory/folder structure could contain implicate order.
In citing it doesn’t matter which copy of the book you actually used. Most citation systems assume that there are libraries you can go to and find some copy or other of a book or article to check out. (There are many styles of citation. I’ll just note Harvard.)
In archiving however, which actual physical copy becomes more important. History works with source material and so this information is vital to the work, not just what book, but which book where and when. And whose.
Genealogy is a microhistory, so we should learn its lessons. The history of the source material needs to be tracked and made available. I have family trees from the 1980s but some I do not know who made them, and none have supporting material.
The folder structure I had originally designed was organised according to how I was researching, not how the information might be useful later to someone else. I was focussing on my own methods of hunting down the next link in the chain, the next clue, completeing the picture in my head, and not how that picture would appear to someone else. So at this point the question became, as my project was relying on the work of others in creating the chain in the first place —was my work going to help that chain go on? And if so, could it be done better?
These days a lot of genealogy is online. There are the commercial suppliers, which now include DNA matching. As well there are the digitised and more recently computerised actual sources of information i.e. NSW Registry of Birth Deaths & Marriages.
I was looking at how to make my file names and directory/folder names be more helpful to someone else looking at the data. This is important because while GRAMPS has a excellent support for citation, source through to repository for events and places in a lifespan, it only links to what you put on the harddrive. You have to think about how to organise it.
This is a good thing.
And even if there were no GRAMPS, (software disappears over time, orphaned, abandoned, scrapped, turned into useless subscription models which dumbdown the use through time [yes I mean you Adobe]) the question would become would that bunch of files and pathway names be good enough for someone else to work out what was going on?
What is interesting at this point in writing this very blog post is that the time taken so far to write it is a gazillion times longer than the time it took to work out the following:
It is important to point out here that a moment before I decided on this structure for the entire pathway (directory structure & file name) I decided against an abbreviation system, i.e. coding. E.G. “B” for birth certificate– B1980-ClaudeManning.pdf or something. I’d try to be as obvious as possible and use the word Birth not “B”. Abbreviations are not banned, but their dismissal lead to a eureka moment. Yes, I was in the bath.
(Abbreviations which have a recognised life out in the real world are admitted e.g. NSW for ‘New South Wales’ and BDM for ‘Births, Death and Marriages’ (Registry).
My pathways were going to include as much non-abbreviated infomation as possible, but be as succint as possible. I also decided that it did not matter where the info was placed, in a many layer pathway system with many folders, or a flatter system with fewer folders. Or more likely, both, so they could integrate depending on load.
Constraints on this are operating systems abilities with respect to pathway and file name length, as well as forbidden characters. This is a big issue as I want my system to be useful to other people across time and space and strange operating systems— across which information might be conveyed (without being looked at). But I’ll describe that detail at the end.
I’ll now breakdown the three elements of MetaSource_DATE_DataPoints.
This is all the Respect des fonds, Provenienzprinzip and Provenance stuff, as informed by Original order. The MetaSource is comprised of the elements:
① Repository (where the information/ material is located)
② Source (a registry, a letter, a book)
③ Authority (author)
Some MetaSources are the one and the same, for example the NSW Registry of Birth, Deaths, and Marriage (NSWBDM) is ① ② and ③ all rolled into one. Clarifying bits may go in the pathway/filename structure after the…
This is the pivot where we move from the big world to the micro details. It uses the ISO 8601 standard, e.g. 2017-02-14.
Having the ③ Authority immediately before the DATE acknowledges the Harvard AUTHOR-DATE citation style often used in the humanities.
This gets the item level stuff. The DataPoints is a an arrangment of:
ⓐevent: birth, death, marriage, occupation, etc
ⓔsource pointer (page number, record ID)
⑤Type: map, photo, scan, downloaded, screenshot, can include secondary DATE
⑥MISC: original file name, original URI
Number ④ Subjects — one can list all or none.
Number ⑤ Type — one can list all or none possible types… for a photo of a scan or a printed screenshot.
If an element can appear twice it may do so, but usually just use the earlier placement.
Imagine a hard drive or some memory device or even somewhere in the cloud where there is an archive of digital and digitized documents. Here is the MetaSource_DATE_DataPoints schema to a birth certificate. The datapoint order need not be consistent, but it would help to do so when one eyes scan down a directory’s listing.
SomeArchive/ NSWBDM/ NSWBDM_1980-06-22_Birth_ManningClaudeRobert_Sydney_1234-1980_scan.jpg
“_” underscore separates elements
“-” appends or supplements an element with some qualification or further detail and resolution, as what it does in a DATE 2017-02-14.
One could introduce subfolders under NSWBDM each for Birth, Deaths and Marriages, but keep the file name as is, and don’t drop birth from the filename. Now just looking at the one example it looks very ugly, but with a dozen on screen in one repository folder an order does become apparent.
Also, I’ll admit that with digitisation and online resources the lines between a fonds and a provenance or even authority get real blurry, but the whole thing is in my metaFonds now so that’s the way I do it.
SomeArchive/ ManningFonds/ ManningFonds_LetterFrom_DaviesM_1963-08-25_toManningClaudeRobert _12AliceSt_scan_2017-03-01.pdf
SomeArchive/ Googlemaps/ Googlemaps_2017-02-14_1PoolSt-Otley_map_satellite_screenshot.jpg SomeArchive/ ArchiwumMapZachodniejPolski/ ArchiwumMapZachodniejPolski_Messtischblatt_1940_Ostrowo-map_download_2017-02-21.jpg
A PROTIP or two
Anything you wanted to do by naming or organising directories according to Family Branch, Family, Person, types of documents, photos (as I did start) with the above MetaSource_DATE_DataPoint system in place — can be done with virtual folders (or saved searches or what Apple Mac OS calls smart folders) because all those datapoints are in the filename. Nothing is lost, a lot is gained.
Keep datapoint elements in the same order, this will help pseudo-folderise files on screen.
Thus, LastnameFirstname for organizing families.
Separate files of scan of the same document? Number the type element with a padded suffix:
(Scan the backs of your photos, they can contain more information than you realise.)
I will also add GRAMPS ID numbers to various elements, particularly for clarity and consistency.
Devil in the Detail : Characters
Remove characters which some computer operating systems find difficult or special:
/ \ .
& Windows in particular: from Naming Files, Paths, and Namespaces (Windows)
< (less than)
> (greater than)
” (double quote)
/ (forward slash)
| (vertical bar or pipe)
? (question mark)
Imagine you grab a USB drive and copy stuff to it and not look at it and then copy it elsewhere and realise data is lost because it was some old FAT formatted thing, and you don’t even have Windows in the house!
Windows also finds long file pathways (this includes both file name & nested folder/directory structure) very difficult, as these are going to be long file names, keep the folder nesting structure as simple as possible. (If burning to an optical disc DVD or CDROM it would be best to compressed the entire structure first to .zip or similar, and burn that file. Optical disks do not support very deep pathways and lengthy filenames.)
Also remove spaces or hyphens from surnames or phrases and CamelCase them. And use CamelCase when removing spaces generally.
I use “~” the tilde to indicate the orginal (format). For example
I’ve also reserved ‘ ` ‘ or tick to indicate primary date.
Maybe keep white space for original file names, but you will have to remove the dots from any original file names or URIs and leave only the one dot to separate the file extension.
nla.12345.D23-123.pdf to …_download_nla-12345-D23-123.pdf
NOTE: File type extensions (i.e. .txt .jpg .odt) may not be included in the above descriptions. Obviously they are another type of DataPoint.
Thus we have all the:-