XCALL XFOLD question?
#17505
04 Sep 09 04:46 PM
|
Joined: Nov 2006
Posts: 2,223
Stephen Funkhouser
OP
Member
|
OP
Member
Joined: Nov 2006
Posts: 2,223 |
What would be the /delimiter/exception// string to keep XFOLD from capitalizing an "s" after an apostrophe?
Stephen Funkhouser Diversified Data Solutions
|
|
|
Re: XCALL XFOLD question?
#17506
04 Sep 09 07:31 PM
|
Joined: Jun 2001
Posts: 11,786
Jack McGregor
Member
|
Member
Joined: Jun 2001
Posts: 11,786 |
Looks like someone has been browsing through the documentation...
I must admit that it has been quite a while since I've used that routine, and after looking at the documentation and the routine itself, it appears that the documentation could benefit from some additional comments:
For sentence mode, the delimiters list must contain first all of the word delimiters, then the period, followed by all of the other sentence delimiters. For example, in the list "\ ,;:.!?", the period, exclamation and question mark are treated as sentence delimiters, while the space, comma, semicolon, and colon are treated as word delimiters.
Each exception must represent a complete word token, and may not begin with a word or sentence delimiter. However, it may end with one.
Now, the first question is whether to treat apostrophe as a word or sentence delimiter, or neither. I'm guessing that you've decided to treat apostrophe as a delimiter so that you don't end up converting O'Malley to O'malley. If you normally use sentence capitalization mode, then apostrophe would have to be listed as a sentence delimiter (after the period in the delimiter list), in order for the character following the apostrophe to be normally capitalized. If you use word mode, then I suppose it could be either kind of delimiter, but since there is no advantage to treating it as a word delimiter, it seems that the best approach is for apostrophe to be a sentence delimiter.
So the task then is to identify the exceptions in which the character following the apostrophe should not be capitalized. Aside from the possessive 's, I can think of several: \'s\'t\'m\'ve\'d\'ll\\ (i.e. to deal with words like Moe's, can't, I'm, we've, he'd, and we'll).
However, that list violates the rule that I added above, i.e. that exceptions cannot start with delimiters. So let's go with \s\t\m\ve\ll\\
At first this seems like a strange list of exceptions, but considering the other comment above (that each exception must be a complete word token, but not start with a delimiter), it only affects those letter combinations when they appear as complete words. (Since apostrophe is being treated as a delimiter, strings such as Moe's or we'll would be split into two word tokens by the apostrophe, allowing the resulting s and ll to be compared against the exception list.)
Admittedly the logic of this is a bit twisted, forcing us to consider natural words as if they were actually two words (one of them the start of the next sentence), but I think it will work. On the other hand, you can probably write a function using regular expressions now that would be more powerful and flexible (although it's doubtful that anything involving regular expressions will ever be considered "natural").
|
|
|
Re: XCALL XFOLD question?
#17507
04 Sep 09 07:48 PM
|
Joined: Jun 2001
Posts: 11,786
Jack McGregor
Member
|
Member
Joined: Jun 2001
Posts: 11,786 |
Here's a sample/test of the above theory: MAP1 CONTRL$,S,100
MAP1 STRING$,S,100
CONTRL$ = "/ ,;:).'!?/II/III/Jr/Sr/Mr/Mrs/Phd/i.e./NASA/IRS/s/t/m/ve/ll"
STRING$ = "MR. O'MALLEY'S THE ONE! HE'LL DO IT! I'M SURE! CAN'T LOSE!"
PRINT STRING$
xcall XFOLD,STRING$,1,CONTRL$
PRINT STRING$
END .RUN XFOLD2
MR. O'MALLEY'S THE ONE! HE'LL DO IT! I'M SURE! CAN'T LOSE!
Mr. O'Malley's the one! He'll do it! I'm sure! Can't lose! Note that the M in O'Malley's gets capitalized, even though there is an exception for m (which prevents I'm from becoming I'M). This is because in the first example the M is not a full token, so the exception list doesn't apply. (The apostrophe in the sentence delimiter list forces Malley to be capitalized, as if it were the first character in a new sentence, but the exceptions list prevents the other apostrophes from generating inappropriate capitalization, i.e. O'Malley'S, He'Ll, Can'T, etc.)
|
|
|
Re: XCALL XFOLD question?
#17508
08 Sep 09 09:50 AM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
Thanks for the explanation, Jack. I never quite understood how the exception list worked and as a result ended up writing my own logic to duplicate what XFOLD already did.
|
|
|
Re: XCALL XFOLD question?
#17509
20 Dec 11 04:17 PM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
How can I use XFOLD (in word mode) to force the state code "CO" (Colorado) to uppercase, but the abbreviation for company "Co." to print as indicated.
Thus, "NORTHWEST INSURANCE CO. [CO]" should get folded to: "Northwest Insurance Co. [CO]".
|
|
|
Re: XCALL XFOLD question?
#17510
20 Dec 11 04:25 PM
|
Joined: Jun 2001
Posts: 11,786
Jack McGregor
Member
|
Member
Joined: Jun 2001
Posts: 11,786 |
Just add "\Co.\CO" to your list of exceptions. (The "Co." needs to come before the "CO" so that it matches first.)
|
|
|
Re: XCALL XFOLD question?
#17511
20 Dec 11 04:28 PM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
Thanks. I had them in reverse order.
|
|
|
Re: XCALL XFOLD question?
[Re: Stephen Funkhouser]
#32270
30 Jan 20 09:40 PM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
I just encountered this problem:
map1 wrk$,s,0 wrk$ = "S. Smith's U.S.A" xcall xfold,wrk$,0,"/ .'/S./s//" ? wrk$
The result: S. Smith's U.s.A.
Is there any way to fix this (i.e., get U.S.A)?
|
|
|
Re: XCALL XFOLD question?
[Re: Stephen Funkhouser]
#32271
30 Jan 20 10:22 PM
|
Joined: Jun 2001
Posts: 11,786
Jack McGregor
Member
|
Member
Joined: Jun 2001
Posts: 11,786 |
The result here is a bit difficult to defend. Looking at the XFOLD.SBR spec, it does say that the exception list should consist of complete word exceptions, although it isn't entirely clear why S. by itself shouldn't be treated as a complete word. On the other hand, the workaround is simple, enough: just add "U.S.A." to the exception list, e.g.
map1 wrk$,s,0
wrk$ = "S. Smith's U.S.A"
xcall xfold,wrk$,0,"/ .'/U.S.A./S.s//"
? wrk$
end
.run xfold3
S. Smith'S U.S.A
|
|
|
Re: XCALL XFOLD question?
[Re: Jack McGregor]
#32272
30 Jan 20 10:40 PM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
1. Why did you add "S.s" to the exception list?
2. Did you mean to add "s" to handle possessives (ex: Smith's)?
3. U.S.A. was just an example. There are other acronyms that contain "S.". Is there a universal solution?
|
|
|
Re: XCALL XFOLD question?
[Re: Stephen Funkhouser]
#32275
31 Jan 20 06:30 AM
|
Joined: Jun 2001
Posts: 11,786
Jack McGregor
Member
|
Member
Joined: Jun 2001
Posts: 11,786 |
1. That was a mistake on my part (mis-copying yours). In fact, the /s/ in your exceptions list was probably responsible for the small s in U.s.A. 2. No, and you probably don't want ' as a delimiter, since it would give you "Smith'S". 3. No, there isn't a good universal solution, but you don't actually need to resort to exceptions in order to force capitalization of acronyms of the form U.S.A., provided you don't mind capitalizing every word ...
xcall xfold,wrk$,0,"/ .//"
.run xfold3
S. Smith's U.S.A
But if you want a rule that will convert LIVING IN THE U.S.A. to Living in the U.S.A. you would need to remove the space from the delimiter list, or switch to sentence mode, either of which would lead to u.S.A. instead of U.S.A., so you would need an exception. Maybe the algorithm could be made smarter, perhaps using regex patterns. I'm not quite sure what the full rule would be though. Maybe in any sequence consisting of <delimiter><alpha><period><alpha> the first alpha should be capitalized?
|
|
|
Re: XCALL XFOLD question?
[Re: Stephen Funkhouser]
#32285
03 Feb 20 03:05 PM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
I guess I was overthinking this. I just needed proper capitalization of names and companies. Your simpler solution works. Thank you.
|
|
|
|
|