Non-latin-1 character string is garbled in POSTed data

I am Japanese.

During development with the FastAPI, which depends on Starlette, I found the following issue with decoding non-latin-1 strings (in my case Japanese):

■Description
Since the character string data requested by the POST method is decoded with latin-1 on the Starlette, non-latin-1 character strings will be garbled.

For example, when I execute the following curl command, I confirmed that the garbled name is saved.

$ curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" --data-urlencode "name=あいうえお" "http://localhost/exampleapi/"

■Cause
If the Content-Type is ”multipart/form-data”, it seems that the following issue and Pull Request already fixes a similar problem.


However, at this time, only ”MultiPartParser” class is modified, ”FormParser” class is not modified, so if Content-Type posts with ”application/x-www-form-urlencoded”, the value will be decoded with latin-1. The problem remains.

■Solution
In my environment, I modified L108 to L109 of starlette/formparsers.py as follows, and confirmed that garbled characters no longer occur.


elif message_type == FormMessage.FIELD_END:
name = unquote_plus(field_name.decode(“utf-8”)) # fix latin-1 → utf-8
value = unquote_plus(field_value.decode(“utf-8”)) # fix latin-1 → utf-8
items.append((name, value))
elif message_type == FormMessage.END:

Therefore, as with Pull Request#562, if a character code is specified in the Content-Type of the request header, I would like to modify it so that the names and values of the POSTed data are decoded with that character code.

■Version
Starlette: 0.13.6