Skip to content

Conversation

@mepeisen
Copy link

No description provided.

@mepeisen
Copy link
Author

mepeisen commented Mar 17, 2018

To use it first change field FixedWidthFontRenderer.FONT to UNIFONT_PLAIN.
This will use a gnu unifont build.
Then activate utf support through either os.setUtf(true) or directly on LuaThread.

I changed LuaString to internally work on java chars. So it will technically use "utf" inside and one will even manipulate utf string through string api etc.
As long as utf support is not enabled in current thread it will export '?' for all non printable characters.

There may be still some limitations.

CI error seems to result in non recompiling the customized Luaj library.

@mepeisen
Copy link
Author

mepeisen commented Mar 17, 2018

Samples:

unicode

`
os.setUtf(true)

print("╭─╮")

print("│ │")

print("│ │")

print("╰─╯")

print("Greek word 'kosme': κόσμε ")
`

@SquidDev
Copy link
Contributor

SquidDev commented Mar 17, 2018

I do like the idea of having unicode support in CC, but I do foresee a few issues:

Instead of changing how Lua handles strings (which rather breaks Lua's semantics), I think it would be better to handle the encoding/decoding of strings inside CC's code (which is what is already done, see LuaJLuaMachine).

However, it's worth noting that that is what was done on previous version of CC (pre-1.76 IIRC), and it was responsible for the infamous "binary string bug". I think it would better if individual methods could "opt-in" to unicode conversion. CCTweaks handled this using an IArguments class, which provided .getString() and .getBytes(), though introducing that now would break binary compatibility of the Java API.

Edit: It's also worth noting that GNU unicode is GPL licensed, which is incompatible with CC's own license. I'm not sure what the ideal solution would be - short of creating our own font (something which seems less than ideal).

@mepeisen
Copy link
Author

I am sorry but this won't work for many reasons. First of all there are already custom hacks into lua made by dan. Hacks to hide unicode characters.
And second reason is the fact that most of the custom gui does not support strings with varying sizes. For example the window API etc. breaks if strings are made of multiple byte characters.

I did not see the licensing Problems :-(
Maybe there is some other monospace unicode font that can be used for characters? Do you know one?

@SquidDev
Copy link
Contributor

First of all there are already custom hacks into lua made by dan. Hacks to hide unicode characters. For example the window API etc. breaks if strings are made of multiple byte characters.

I'm aware of this, but it doesn't require changes to Lua's semantics: you can just modify the encoding/decoding functions on CC's end. It's pretty trivial to convert a byte array to a UTF8 string and vice versa: this is how Lua 5.3 implements its unicode support.

@mepeisen
Copy link
Author

mepeisen commented Mar 17, 2018

It's pretty trivial to convert a byte array to a UTF8 string and vice versa

Indeed.
But the changes of dan makes it impossible because you cannot simply use lua scripts saved with utf8. They get corrupt because the java strings will not correctly be loaded into lua strings. But maybe better rewinding the customizations made by dan and then build a unicode support on base of a more clean luaj version?

@SquidDev
Copy link
Contributor

SquidDev commented Mar 17, 2018

But the changes of dan makes it impossible because you cannot simply use lua scripts saved with utf8.

They don't really make it impossible, and they definitely don't necessitate as large an overhaul as this. I believe the only string encoding related change Dan has made is this one (and the corresponding decode function):

public String tojstring() {
/* DAN200 START */
/*
return decodeAsUtf8(m_bytes, m_offset, m_length);
*/
char[] chars = new char[ m_length ];
for( int i=0; i<chars.length; ++i )
{
chars[i] = (char)(m_bytes[ m_offset + i ] & 255);
}
return new String( chars );
/* DAN200 END */
}

You don't ever need to touch this: you can just replace the following line with a call to new String(str.m_bytes, str.m_offset, str.m_length, "UTF8").

Obviously you'd have to handle the CC → Lua conversion too, but it's much the same (just a call to LuaString.valueOf(str.getBytes("UTF8")).

@mepeisen
Copy link
Author

mepeisen commented Mar 17, 2018

Yes. This was my first idea. There is another section:

bytes[ offset + length + i ] = (ch < 256) ? (byte)ch : (byte)'?';

I changed both and ran into huge trouble.

Some things from StringLib need to be rewritten or there need to be a custom string lib. If you do not rewrite it the luca scripts will always operate on the byte size of strings, not the character size. As I already tested this breaks the whole bios/api (namely terminals, monitors, windows etc.)

And it requires to not switch between utf/ non-utf mode because this breaks term/window api too. But anyhow dynamically switching from non-utf to utf and back during runtime may be no good idea at all ;-)

@mepeisen
Copy link
Author

mepeisen commented Mar 18, 2018

I reviwed the code. Here is my first commit. mepeisen@8876b27

Known limitations: Although possible one should be careful by switching utf 8 support in lua scripts via os.setUtf. Existing multishells etc. may get broken if they already received utf characters. But as long as you produce no output or clear all terminals/windows before switching this works.

Currently the string metatable is working on plain bytes (original variant). As well as the length-of operator (#). Program developers should always user string.len instead of length-of operator to be utf-8 compatible.
But this is ok since utf Support was not there.
I am looking for a way to wrap the string metatable without hacking into luaj so that s:len etc. work as expected :-)

The window object gives me some problems. I needed to detect whether a non utf8 script tries to print utf characters (however the script made it):
https://github.com/mepeisen/ComputerCraft/blob/master/src/main/java/dan200/computercraft/core/apis/TermAPI.java#L260
In this case we get non matching arguments in size.
Monitors need them too but I do not know which computer printed the text :-)
https://github.com/mepeisen/ComputerCraft/blob/master/src/main/java/dan200/computercraft/shared/peripheral/monitor/MonitorPeripheral.java#L206

Any ideas?

Nice sideeffect on the changes: The server computer now decides how to treat strings. Before that may get clipboard Input with utf characters but not knowing how to treat them.

To get around the font licensing problem I will add a feature to add fonts via resource packs. I am looking for a way to set individual fonts for each display as well as new API methods (getFonts / setFont). So CCraft need not ship GNU unifont. If one likes to use it he/she can simply install a n additional resource pack.

Did i miss something? Or everything ok to create a pull request?

@mepeisen
Copy link
Author

New pull request :-)
#532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants