what does the codecvt::max_length mean with state-depend encoding

When looking at the description about max_length in cppreference I found the following comment:

If the encoding is state-dependent (encoding() == -1), 
then more than max_length() external characters may be consumed to produce one internal character. 

So I think this means that: when convert an external char sequence to an internal char, the system might consume some external chars for the shift state, and other external chars for the internal char.

When looking at the libstdc++ implementation I can see the following code:

int codecvt<wchar_t, char, mbstate_t>::
do_max_length() const throw()
{
  ...
  int __ret = MB_CUR_MAX;
  ...
  return __ret;
}

So at least for GCC, max_length should return MB_CUR_MAX. In fact libc++ has a same implementation too.

But when looking at the reference of mbtowc, which is used in C to convert a char sequence to a wchar_t, I can see the following comment:

No more than MB_CUR_MAX characters are examined in any case.

In fact wctomb has the similar comments.

So I am totally confused:
If codecvt::max_length equals to MB_CUR_MAX, then this means that MB_CUR_MAX might not enough for the internal/external char conversion because of the extra position for shift-state. But then why mbtowc/wctomb will consume/produce at most MB_CUR_MAX characters?

  • So, an implementation of some conversions might consume additional characters. Then you are checking some conversions that do not. That is not a contradiction of the general rule that it is allowed.

    – 




Leave a Comment