modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285
modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285mshannon-sil wants to merge 6 commits intomainfrom
Conversation
…emarks to chapters
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93 and mshannon-sil).
machine/corpora/update_usfm_parser_handler.py line 345 at r1 (raw file):
tokens = list(self._tokens) if chapters is not None: tokens = self._get_incremental_draft_tokens(tokens, chapters)
I think we can do something similar, but before we parse instead of after. Instead of calling parse_usfm in update_usfm, we can do something like this:
tokenizer = UsfmTokenizer(self._settings.stylesheet)
tokens = tokenizer.tokenize(usfm)
tokens = filter_tokens_by_chapter(tokens, chapters)
parser = UsfmParser(tokens, handler, self._settings.stylesheet, self._settings.versification)
parser.process_tokens()This would avoid updating the whole book.
|
Previously, ddaspit (Damien Daspit) wrote…
I updated it accordingly, how does it look now? If we change parse_usfm to accept a Sequence[UsfmToken] as well as str, we could call it here and let it instantiate the parser and process the tokens to avoid some code duplication. But not sure if there's a reason parse_usfm only accepts str. |
This PR addresses issue #284.
Mostly looking for high-level feedback about the approach at the moment. As we were discussing, is the right place for this functionality in the
get_usfm()method as essentially a post-processing step? Or should we look to implement this feature inprocess_tokens()(and maybe move the remark logic here as well)?Some initial thoughts:
Pros for putting it in
get_usfm():process_token().Pros for putting it in
process_token():This change is