Beginning unicode support #531

mepeisen · 2018-03-17T14:34:33Z

No description provided.

mepeisen · 2018-03-17T14:39:28Z

To use it first change field FixedWidthFontRenderer.FONT to UNIFONT_PLAIN.
This will use a gnu unifont build.
Then activate utf support through either os.setUtf(true) or directly on LuaThread.

I changed LuaString to internally work on java chars. So it will technically use "utf" inside and one will even manipulate utf string through string api etc.
As long as utf support is not enabled in current thread it will export '?' for all non printable characters.

There may be still some limitations.

CI error seems to result in non recompiling the customized Luaj library.

mepeisen · 2018-03-17T14:46:02Z

Samples:

`
os.setUtf(true)

print("╭─╮")

print("│ │")

print("╰─╯")

print("Greek word 'kosme': κόσμε ")
`

SquidDev · 2018-03-17T14:48:19Z

I do like the idea of having unicode support in CC, but I do foresee a few issues:

Instead of changing how Lua handles strings (which rather breaks Lua's semantics), I think it would be better to handle the encoding/decoding of strings inside CC's code (which is what is already done, see LuaJLuaMachine).

However, it's worth noting that that is what was done on previous version of CC (pre-1.76 IIRC), and it was responsible for the infamous "binary string bug". I think it would better if individual methods could "opt-in" to unicode conversion. CCTweaks handled this using an IArguments class, which provided .getString() and .getBytes(), though introducing that now would break binary compatibility of the Java API.

Edit: It's also worth noting that GNU unicode is GPL licensed, which is incompatible with CC's own license. I'm not sure what the ideal solution would be - short of creating our own font (something which seems less than ideal).

mepeisen · 2018-03-17T14:57:05Z

I am sorry but this won't work for many reasons. First of all there are already custom hacks into lua made by dan. Hacks to hide unicode characters.
And second reason is the fact that most of the custom gui does not support strings with varying sizes. For example the window API etc. breaks if strings are made of multiple byte characters.

I did not see the licensing Problems :-(
Maybe there is some other monospace unicode font that can be used for characters? Do you know one?

SquidDev · 2018-03-17T14:59:32Z

First of all there are already custom hacks into lua made by dan. Hacks to hide unicode characters. For example the window API etc. breaks if strings are made of multiple byte characters.

I'm aware of this, but it doesn't require changes to Lua's semantics: you can just modify the encoding/decoding functions on CC's end. It's pretty trivial to convert a byte array to a UTF8 string and vice versa: this is how Lua 5.3 implements its unicode support.

mepeisen · 2018-03-17T15:06:42Z

It's pretty trivial to convert a byte array to a UTF8 string and vice versa

Indeed.
But the changes of dan makes it impossible because you cannot simply use lua scripts saved with utf8. They get corrupt because the java strings will not correctly be loaded into lua strings. But maybe better rewinding the customizations made by dan and then build a unicode support on base of a more clean luaj version?

SquidDev · 2018-03-17T15:14:25Z

But the changes of dan makes it impossible because you cannot simply use lua scripts saved with utf8.

They don't really make it impossible, and they definitely don't necessitate as large an overhaul as this. I believe the only string encoding related change Dan has made is this one (and the corresponding decode function):

ComputerCraft/luaj-2.0.3/src/core/org/luaj/vm2/LuaString.java

Lines 187 to 199 in 914df8b

    
           	public String tojstring() { 
        
                   /* DAN200 START */ 
        
           		/* 
        
           		return decodeAsUtf8(m_bytes, m_offset, m_length); 
        
           		*/ 
        
                   char[] chars = new char[ m_length ]; 
        
                   for( int i=0; i<chars.length; ++i ) 
        
                   { 
        
                       chars[i] = (char)(m_bytes[ m_offset + i ] & 255); 
        
                   } 
        
                   return new String( chars ); 
        
                   /* DAN200 END */ 
        
           	}

You don't ever need to touch this: you can just replace the following line with a call to new String(str.m_bytes, str.m_offset, str.m_length, "UTF8").

ComputerCraft/src/main/java/dan200/computercraft/core/lua/LuaJLuaMachine.java

Line 621 in 914df8b

return str.tojstring();

Obviously you'd have to handle the CC → Lua conversion too, but it's much the same (just a call to LuaString.valueOf(str.getBytes("UTF8")).

mepeisen · 2018-03-17T15:36:05Z

Yes. This was my first idea. There is another section:

ComputerCraft/luaj-2.0.3/src/core/org/luaj/vm2/Buffer.java

Line 181 in e85cdac

bytes[ offset + length + i ] = (ch < 256) ? (byte)ch : (byte)'?';

I changed both and ran into huge trouble.

Some things from StringLib need to be rewritten or there need to be a custom string lib. If you do not rewrite it the luca scripts will always operate on the byte size of strings, not the character size. As I already tested this breaks the whole bios/api (namely terminals, monitors, windows etc.)

And it requires to not switch between utf/ non-utf mode because this breaks term/window api too. But anyhow dynamically switching from non-utf to utf and back during runtime may be no good idea at all ;-)

mepeisen · 2018-03-18T12:18:22Z

I reviwed the code. Here is my first commit. mepeisen@8876b27

Known limitations: Although possible one should be careful by switching utf 8 support in lua scripts via os.setUtf. Existing multishells etc. may get broken if they already received utf characters. But as long as you produce no output or clear all terminals/windows before switching this works.

Currently the string metatable is working on plain bytes (original variant). As well as the length-of operator (#). Program developers should always user string.len instead of length-of operator to be utf-8 compatible.
But this is ok since utf Support was not there.
I am looking for a way to wrap the string metatable without hacking into luaj so that s:len etc. work as expected :-)

The window object gives me some problems. I needed to detect whether a non utf8 script tries to print utf characters (however the script made it):
https://github.com/mepeisen/ComputerCraft/blob/master/src/main/java/dan200/computercraft/core/apis/TermAPI.java#L260
In this case we get non matching arguments in size.
Monitors need them too but I do not know which computer printed the text :-)
https://github.com/mepeisen/ComputerCraft/blob/master/src/main/java/dan200/computercraft/shared/peripheral/monitor/MonitorPeripheral.java#L206

Any ideas?

Nice sideeffect on the changes: The server computer now decides how to treat strings. Before that may get clipboard Input with utf characters but not knowing how to treat them.

To get around the font licensing problem I will add a feature to add fonts via resource packs. I am looking for a way to set individual fonts for each display as well as new API methods (getFonts / setFont). So CCraft need not ship GNU unifont. If one likes to use it he/she can simply install a n additional resource pack.

Did i miss something? Or everything ok to create a pull request?

mepeisen · 2018-03-18T20:49:26Z

New pull request :-)
#532

mepeisen added 2 commits March 17, 2018 15:31

beginning unicode support

098437e

Merge branch 'master' of https://github.com/dan200/ComputerCraft

dc6607f

mepeisen closed this Mar 18, 2018

SquidDev mentioned this pull request Jul 16, 2021

How to provide Unicode character set support for CCT? cc-tweaked/CC-Tweaked#860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Beginning unicode support #531

Beginning unicode support #531

Uh oh!

mepeisen commented Mar 17, 2018

Uh oh!

mepeisen commented Mar 17, 2018 •

edited

Loading

Uh oh!

mepeisen commented Mar 17, 2018 •

edited

Loading

Uh oh!

SquidDev commented Mar 17, 2018 •

edited

Loading

Uh oh!

mepeisen commented Mar 17, 2018

Uh oh!

SquidDev commented Mar 17, 2018

Uh oh!

mepeisen commented Mar 17, 2018 •

edited

Loading

Uh oh!

SquidDev commented Mar 17, 2018 •

edited

Loading

Uh oh!

mepeisen commented Mar 17, 2018 •

edited

Loading

Uh oh!

mepeisen commented Mar 18, 2018 •

edited

Loading

Uh oh!

mepeisen commented Mar 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Beginning unicode support #531

Beginning unicode support #531

Uh oh!

Conversation

mepeisen commented Mar 17, 2018

Uh oh!

mepeisen commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mepeisen commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SquidDev commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mepeisen commented Mar 17, 2018

Uh oh!

SquidDev commented Mar 17, 2018

Uh oh!

mepeisen commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SquidDev commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mepeisen commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mepeisen commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mepeisen commented Mar 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mepeisen commented Mar 17, 2018 •

edited

Loading

mepeisen commented Mar 17, 2018 •

edited

Loading

SquidDev commented Mar 17, 2018 •

edited

Loading

mepeisen commented Mar 17, 2018 •

edited

Loading

SquidDev commented Mar 17, 2018 •

edited

Loading

mepeisen commented Mar 17, 2018 •

edited

Loading

mepeisen commented Mar 18, 2018 •

edited

Loading