I am Japanese.
During development with the FastAPI, which depends on Starlette, I found the following issue with decoding non-latin-1 strings (in my case Japanese):
Since the character string data requested by the POST method is decoded with latin-1 on the Starlette, non-latin-1 character strings will be garbled.
For example, when I execute the following curl command, I confirmed that the garbled name is saved.
$ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" --data-urlencode "name=あいうえお" "http://localhost/exampleapi/"
If the Content-Type is ”multipart/form-data”, it seems that the following issue and Pull Request already fixes a similar problem.
However, at this time, only ”MultiPartParser” class is modified, ”FormParser” class is not modified, so if Content-Type posts with ”application/x-www-form-urlencoded”, the value will be decoded with latin-1. The problem remains.
In my environment, I modified L108 to L109 of starlette/formparsers.py as follows, and confirmed that garbled characters no longer occur.
elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode(“utf-8”)) # fix latin-1 → utf-8
value = unquote_plus(field_value.decode(“utf-8”)) # fix latin-1 → utf-8
elif message_type == FormMessage.END:
Therefore, as with Pull Request#562, if a character code is specified in the Content-Type of the request header, I would like to modify it so that the names and values of the POSTed data are decoded with that character code.