In our project we need to create and maintain a collection of ancient manuscripts (which were scanned and converted to text using OCR software). Number of manuscripts is ca. 1000. Some of them were manually copied and passed through generations so different versions of them appeared over time. Differences in one version are usually small, but amount of versions of one manuscript might be significant, about 5-7 on average. Manuscripts are grouped into Groups based on their content and other factors. Our project serves as some sort of "middle-ware" or pure data supply for other projects which might present the information in a more user-friendly ways, like desktop GUI, website or mobile apps. Our infrastructure should enable collaboration (like error corrections, etc.) for those daughter projects and for individuals, something like a wiki.
Initial idea was to keep manuscripts as plain text files (in org-mode for lightweight markup and some metadata) while Groups should be represented by directories, like this:
Project/
├── Group1
│ ├── Group3
│ ├── manuscript_A
│ └── manuscript_B
└── Group2
└── manuscript_C
Different versions of a manuscript should be kept in separate permanent (i.e. not to be merged) git branches, like branch manuscript_B-Athens_728.
Questions:
The problem with such an approach is that if one uploads such git repository to e.g. GitLab all different branches of ALL manuscripts will be displayed at once rendering this versioning system unusable. Is there a way to group branches hierarchically or somehow "attach" a set of branches to one file (manuscript)?
Is it possible somehow for a reader who reads in the middle of certain file to get an indication that on that particular place in text another version exists, that can be found in such and such branch?
How well can git couple with the case when everything will be in Unicode: (a) manuscript content, (b) project, directory and file names, (c) branch names?
Are there better approaches to organize such collection (in git)? I was thinking about creating a separate git repository per manuscript
like this:
Project/
├── Group1
│ ├── Group3
│ ├── Manuscript_A
│ │ └── manuscript_A
│ └── Manuscript_B
│ └── manuscript_B
└── Group2
└── Manuscript_C
└── manuscript_C
but this seems more difficult to maintain and you get an unnecessary hierarchy level - Manuscript_A type directories... Or is it possible to have several git repos in one directory each tracking its specific file?