Week 0: Start of the coding period
Me and my mentor had a video call and discussed the possible approaches in which we could do this tree model refactoring. We identified three different ways, which I’ll explain next:
Approach 1 - Using the setParent() function to populate a children vector in parent object
The objects of class Element already have a function parent() and setParent(), which means that already the parent element exists for each element. So it means we already have the whole tree structure, but the links go only upwards and not downwards. So that means if we can take the existing structure and collect the downwards links as soon as they are created, we can get the whole tree “for free” without changing any of the derived classes (Measure, Chord, Note, etc.)
I actually tried to do it this way, I created a vector called _children
in the ScoreElement class and then added an Element to _children
as soon as setParent()
is called, and removed it from the vector in the destructor function.
Advantages:
- We can do it all by modifying one or two classes, no need to change any of the derived classes. In fact I already have a partially working implementation of this one on my branch, on which I can even do tree traversals. Kartikay26/MuseScore at tree-model-gsoc.
- If we ever decide to add a new class, (say MMRest, in Issac’s project, or a new type of symbol or anything), we would just have to call setParent() and it would automatically get added to the tree. We wouldn’t need to change the parent class or anything.
- Each and every element is guaranteed to be in the tree (if its parent is not NULL), so while doing a tree traversal we are sure to see each element at least once. We can ignore it if we want to, and do something if required.
Disadvantages:
- The return type of
treeChild(int idx)
is not guaranteed to be fixed. In fact an element may have completely different kinds of elements as its children. Now it’s not exactly a disadvantage, it might even be useful in some situations, (like a Note may have as its children a Tie, a NoteDot, and an Accidental), but it is definitely a bit disorderly. - Duplicates might be added to the list which I tried to prevent by first checking whether the element already exists in the parent element. (This takes some extra computation, it might be irrelevant for a small score but I’m not sure how much it would impact performance in a large score.)
- And now this is the biggest problem: The order of children elements is not fixed. They are added to the vector in the same order in which they are generated. It means we cannot assume that measures will appear in the same order of measure numbers, or notes in a chord will be from top to bottom, etc.
- This method is still leading to a lot of crashes (segfaults etc), while using the software and even in the tests, right now. It’s probably because there are a lot of pointers involved which are of a generic
ScoreElement*
type and maybe sometimes some pointer is accessed to the wrong type or to some deleted memory location and there’s also a possibility of a NULL pointer appearing in the wrong place. So I’ll have to debug these crashes too. Basically the problem is that once we start usingScoreElement*
pointers, type safety is no longer guaranteed at compile time, and there’s also the issue of managing lifetime of the objects correctly.
How to fix the ordering issue?
I tried to fix the ordering problem by adding a virtual function treeIndex()
to the elements which returns something like measure number for measures, page number for pages, etc. But still a problem occurs in the elements which are stored in a linked list, like segments. E.g. The bar line segments are still not showing up in the correct place in the tree even though I did try to order the segments.
Also if I do this I’ll still have to add a virtual function to each element, and it won’t be that efficient either because of the call to sort() each time a child element is added, and I might as well implement treeChild(n) for each element instead which would be better and easier.
Ideally, what we could do is, instead of just setParent
, we could have functions like appendChild(child)
and insertChild(position, child)
. This would mean the responsibility to insert the children in the proper place lies in the child classes. We would have to rewrite all the occurrences of setParent()
in the current code.
I’m not exactly sure how much code we will have to rewrite for this but I tried to check just now, it looks like there are about 350 calls to setParent() in libmscore+mscore folders, and about 425 in the whole repository.
What I've done yet
Now I’ll also show you what I have done using this method so far. I have added a vector _children
to ScoreElement and added the following Score hierarchy functions:
I modified the Element::setParent()
function to populate the children vector here.
I have also created a QAbstractItemModel based on these Tree functions and added a QDockWidget to the right side which shows the elements in a QTreeView. (I took some code here from my mentor's fork).
I added some code to make it select the element when I click on it, so it’s helpful to see which note is which and so on.
I took a bit of extra time to make this ‘tree debugger’ kind of thing because I imagine it would be useful later on even if we end up following the other approach. And we can also see what the current structure of all the elements is, so that’s pretty useful too.
Approach 2 - Implementing a treeChild(int n) function in each class
The other approach for creating a tree is, we could create a virtual function treeChild(int n)
which has to be reimplemented in each of the derived classes.
For example, in the class Measure it would look like:
Segment* Measure::treeChild(int n){ Segment* cur = first(); for (int i = 0; i < n; ++i) { cur = cur->next(); } return cur; }
And for a Note:
Element* Note::treeChild(int n){ vector<Element*> children; if (_accidental) children.push_back(_accidental); if (_tiefor) children.push_back(_tiefor); if (_dots) children.insert(children.end(), _dots.begin(), _dots.end()); if (_el) children.insert(children.end(), _el.begin(), _el.end()); return children[n]; }
Advantages:
- Simplest method to implement, we have to implement just this one virtual function in all the derived classes. In fact there’s already a scanElements function and we can mostly re-use the same code from there.
- Children are always in the correct order.
- The type of children will be in our control, it would not return anything unexpected. We could make sure that the child of Measure is Segment and the child of Segment is always ChordRest. If required we can also return a generic
Element*
if there are different kinds of children.
Disadvantages:
- It is possible that the parent() of an Element may become out of sync with the parent in this tree model, because we are basically creating a new tree which may be different from the old one.
Some elements may be left out of the tree. In some cases it might be useful, but in other places we might have needed those. - There’s some repeated work being done to access elements in linked lists and even in the above implementation of Note. If all we wanted to do was to iterate over the Measures, say, and we call treeChild(1), treeChild(2), ... treeChild(n), then it would become an O(n^2) operation instead of O(n). This will affect the performance of the application for large scores. There is a possible improvement we can do to fix this, as described ahead.
- We will also have to create a treeChildCount() function. Even this function would take O(n) time for linked list based classes.
Approach 3 - An Iterator based method
Most of the times the children in the tree will be accessed it will usually be in a loop. We generally wouldn’t want to access specifically the nth element in the children, but usually it will be because we’re writing a loop like this
for (int i = 0; i < el->treeChildCount(); i++) { Element* child = el->treeChild(n) do_stuff(child) }
It would be better if we could use an iterator and write the loop like this:
for (Element* child : el->children()) { do_stuff(child) }
If we use an iterator we could also make it iterate over linked-list based classes in O(n) time complexity.
A basic implementation of this method would require us to create a nested iterator class in each derived class and implement operator++
, operator!=
etc, and provide begin()
and end()
iterators for each of the classes, somewhat like this:
class ScoreElement { public: ... class iterator { QVariant _pos; public: iterator(QVariant position) : _el(parent) {} // constructor virtual iterator operator++() {} // returns next iterator virtual ScoreElement* operator*(){} // return associated ScoreElement* bool operator !=(const iterator& other) { return _pos != other._pos; } } virtual iterator begin() = 0; virtual iterator end() = 0; ... }
There is a bit of extra complexity associated with this approach because we’ll have to re-implement a nested class instead of a function in each of the derived classes. We can try to add a layer of indirection and make it use a virtual function from the original ScoreElement class, which might be called getNextChild(QVariant position)
and getChildAt(QVariant position)
, and implement it like this:
class ScoreElement { public: ... class iterator { QVariant _pos; ScoreElement* _el; public: iterator(ScoreElement* parent, QVariant position) : _el(parent), _pos(position) {} // constructor iterator operator++() { // returns next iterator return iterator(_el, _el->getNextChild(_pos)); } ScoreElement* operator*(){ // return associated ScoreElement* return _el->getChildAt(_pos); } bool operator !=(const iterator& other) { return _pos != other._pos; } } virtual iterator begin() = 0; virtual iterator end() = 0; virtual QVariant getNextChild(QVariant position) = 0; virtual ScoreElement* getChildAt(QVariant position)= 0; ... }
Now we only have to implement getNextChild() and getChildAt() in the derived classes. The reason “position” is declared as QVariant here is that for classes which store children in vectors, we can use an int, and getNextChild(x)
would return x+1
, and for the ones that store the children in linked lists, position can be a pointer, like Segment*, and getNextChild(x)
would return x->next()
. Similarly getChildAt(x)
would return either someElementList[x]
or x
, depending on what the type of x
is. If we want to return different kinds of elements (like in Note), the logic would become a bit more complicated there but I think we can still do it.
I have not tested this method yet so I’m still not sure what other problems may be associated with this one, but it is otherwise mostly the same as the previous method.
Some testing for writing MSCX files
Other than all this I was also trying to see how I can write MSCX files by doing DFS on the tree, just like I mentioned in my last blog post. I noticed that there is a Pid enum, which already contains a list of a lot of properties which each Element can have.
I tried traversing over the Pid enum and writing the elements. This is the code I tried:
void treeWrite(XmlWriter& xml, ScoreElement* element) { xml.stag(element->name()); for (int i = 0; i < int(Pid::END); i++) { element->writeProperty(xml, Pid(i)); } for (int i = 0; i < element->treeChildCount(); i++) { ScoreElement* child = element->treeChild(i); treeWrite(xml, child); } xml.etag(); }
This is some of the generated output:
I cleaned up some of the repeated parts and compared it against a standard MSCX file and this is the comparison (MSCX on left, my code’s output on the right):
So it’s generating most of the same data in some places, but it is quite different too in other places. One thing that is different is that a Measure is separated into voices in the MSCX file, but my code is writing vertical Segments instead. Also there’s the ordering issue which I mentioned above. It’s also writing the generated elements like stems and clefs.
Also my program crashes while trying to write some invalid properties, but that can probably be fixed.
While doing further research on this, I found that there’s a Element::writeProperties function which automatically writes some of the correct properties, checks whether the element is generated or not, etc.
Also I noticed most of the classes have in their write function the list of properties they want to write, e.g. like this in note.cpp:
for (Pid id : { Pid::PITCH, Pid::TPC1, Pid::TPC2, Pid::SMALL, Pid::MIRROR_HEAD, Pid::DOT_POSITION, Pid::HEAD_SCHEME, Pid::HEAD_GROUP, Pid::VELO_OFFSET, Pid::PLAY, Pid::TUNING, Pid::FRET, Pid::STRING, Pid::GHOST, Pid::HEAD_TYPE, Pid::VELO_TYPE, Pid::FIXED, Pid::FIXED_LINE }) { writeProperty(xml, id); }
So if we are going to refactor the write function we’ll have to figure out what we want to do with these Properties and the file format. These properties could be put in a vector directly in the classes, maybe in the header files. Other parts of code could also use this data then. We could simplify the file handling code, making it store and load the whole tree at once, but that would thoroughly change the file format. However, it would get rid of most of the complexity in the file handling code.
A Note on Method 1 vs Method 2
I want to point out what the difference between the two approaches are from a theoretical perspective. Basically there’s a principle called the Single Source of Truth (SSOT) principle which says that for any data in the code, there should be only one “source of truth”.
In approach 1, we are trying to make the tree the SSOT, somewhat like it is in the browser’s DOM. That means that let’s say I move a node (say a measure) in the tree from one place to another, then the change should actually show up in the score too.
If we are able to make the tree the SSOT for everything, some things like file handling and copy-paste would become really trivial. For copy-pasting, all we would have to do is to move/copy a tree node from one place to another. FIle handling too would become easier, we can simply serialize/deserialize the tree into a file and load it from a file. No custom code/file format change would be necessary if we introduce a new element or new property someday, for example. We could get rid of the linked lists in the code too someday in the future. Or they could just stay there, but the source of truth would still be the tree. Later on, the code can be made to follow a more “functional” style of programming rather than the object-oriented style. All the data would remain in the tree and there would be functions that can operate on it. From a theoretical point of view, I believe this is a better approach, and the code would become more maintainable.
In approach 2, we’re keeping the current code as it is, and the classes are still the main sources of truth. But we are just adding the code for a new tree model on top of it. This would be more practical to implement for now because I won’t have to rewrite a lot of code.
Next Blog: https://musescore.org/en/user/1743616/blog/2020/06/08/gsoc-2020-tree-mo…
Previous Blog: https://musescore.org/en/user/1743616/blog/2020/05/17/gsoc-2020-project…
Comments
Wow, lots to digest here. A few random comments:
For the most part setParent() is called only when new elements are added, not on each layout, so performance impact should not be very noticeable.
Some elements are created and destroyed on the fly during layout, that's probably leading to some of the crashes you get crashes if you are storing this info.
I like the tree debugger!
One of the valuable things I would hope to get from this is simplification of the code for the previous/next element commands (Alt+Left/Right). This code currently does a sort of similar thing algorithmically as opposed to creating a data structure to traverse, and it's a mess of special cases. The good thing about it is, it is well-thought out in terms of the order of traversal, so it will make a good model for that. Anyhow, to me, an important consideration should be figuring out a way to simplify this but also not change the actual navigation sequence without good reason.
The idea of treeChild has some appeal as it also mimics the Qt accessibility tree (not completely, though, things like "segment" have no relevance to accessibility), and as you observe, we do something similar with scanElements() already. There is probably other opportunity to leverage this, like in the layout(0 or shape() functions for various classes that often need to do this for each of their children.