better utf-8 handling (though, we currently don't take in to account what