- •Contents
- •Preface
- •An Experiment
- •Acknowledgments
- •Versions of this Book
- •About This Book
- •What You Should Know Before Reading This Book
- •Overall Structure of the Book
- •The Way I Implement
- •Initializations
- •Error Terminology
- •The C++ Standards
- •Example Code and Additional Information
- •1 Comparisons and Operator <=>
- •1.1.1 Defining Comparison Operators Before C++20
- •1.1.2 Defining Comparison Operators Since C++20
- •1.2 Defining and Using Comparisons
- •1.2.2 Comparison Category Types
- •1.2.4 Calling Operator <=> Directly
- •1.2.5 Dealing with Multiple Ordering Criteria
- •1.3 Defining operator<=> and operator==
- •1.3.1 Defaulted operator== and operator<=>
- •1.3.2 Defaulted operator<=> Implies Defaulted operator==
- •1.3.3 Implementation of the Defaulted operator<=>
- •1.4 Overload Resolution with Rewritten Expressions
- •1.5 Using Operator <=> in Generic Code
- •1.5.1 compare_three_way
- •1.6 Compatibility Issues with the Comparison Operators
- •1.6.2 Inheritance with Protected Members
- •1.7 Afternotes
- •2 Placeholder Types for Function Parameters
- •2.1.1 auto for Parameters of Member Functions
- •2.2 Using auto for Parameters in Practice
- •2.3.1 Basic Constraints for auto Parameters
- •2.4 Afternotes
- •3.1 Motivating Example of Concepts and Requirements
- •3.1.1 Improving the Template Step by Step
- •3.1.2 A Complete Example with Concepts
- •3.2 Where Constraints and Concepts Can Be Used
- •3.2.1 Constraining Alias Templates
- •3.2.2 Constraining Variable Templates
- •3.2.3 Constraining Member Functions
- •3.3 Typical Applications of Concepts and Constraints in Practice
- •3.3.1 Using Concepts to Understand Code and Error Messages
- •3.3.2 Using Concepts to Disable Generic Code
- •3.3.3 Using Requirements to Call Different Functions
- •3.3.4 The Example as a Whole
- •3.3.5 Former Workarounds
- •3.4 Semantic Constraints
- •3.4.1 Examples of Semantic Constraints
- •3.5 Design Guidelines for Concepts
- •3.5.1 Concepts Should Group Requirements
- •3.5.2 Define Concepts with Care
- •3.5.3 Concepts versus Type Traits and Boolean Expressions
- •3.6 Afternotes
- •4.1 Constraints
- •4.2.1 Using && and || in requires Clauses
- •4.3 Ad-hoc Boolean Expressions
- •4.4.1 Simple Requirements
- •4.4.2 Type Requirements
- •4.4.3 Compound Requirements
- •4.4.4 Nested Requirements
- •4.5 Concepts in Detail
- •4.5.1 Defining Concepts
- •4.5.2 Special Abilities of Concepts
- •4.5.3 Concepts for Non-Type Template Parameters
- •4.6 Using Concepts as Type Constraints
- •4.7 Subsuming Constraints with Concepts
- •4.7.1 Indirect Subsumptions
- •4.7.2 Defining Commutative Concepts
- •5 Standard Concepts in Detail
- •5.1 Overview of All Standard Concepts
- •5.1.1 Header Files and Namespaces
- •5.1.2 Standard Concepts Subsume
- •5.2 Language-Related Concepts
- •5.2.1 Arithmetic Concepts
- •5.2.2 Object Concepts
- •5.2.3 Concepts for Relationships between Types
- •5.2.4 Comparison Concepts
- •5.3 Concepts for Iterators and Ranges
- •5.3.1 Concepts for Ranges and Views
- •5.3.2 Concepts for Pointer-Like Objects
- •5.3.3 Concepts for Iterators
- •5.3.4 Iterator Concepts for Algorithms
- •5.4 Concepts for Callables
- •5.4.1 Basic Concepts for Callables
- •5.4.2 Concepts for Callables Used by Iterators
- •5.5 Auxiliary Concepts
- •5.5.1 Concepts for Specific Type Attributes
- •5.5.2 Concepts for Incrementable Types
- •6 Ranges and Views
- •6.1 A Tour of Ranges and Views Using Examples
- •6.1.1 Passing Containers to Algorithms as Ranges
- •6.1.2 Constraints and Utilities for Ranges
- •6.1.3 Views
- •6.1.4 Sentinels
- •6.1.5 Range Definitions with Sentinels and Counts
- •6.1.6 Projections
- •6.1.7 Utilities for Implementing Code for Ranges
- •6.1.8 Limitations and Drawbacks of Ranges
- •6.2 Borrowed Iterators and Ranges
- •6.2.1 Borrowed Iterators
- •6.2.2 Borrowed Ranges
- •6.3 Using Views
- •6.3.1 Views on Ranges
- •6.3.2 Lazy Evaluation
- •6.3.3 Caching in Views
- •6.3.4 Performance Issues with Filters
- •6.4 Views on Ranges That Are Destroyed or Modified
- •6.4.1 Lifetime Dependencies Between Views and Their Ranges
- •6.4.2 Views with Write Access
- •6.4.3 Views on Ranges That Change
- •6.4.4 Copying Views Might Change Behavior
- •6.5 Views and const
- •6.5.1 Generic Code for Both Containers and Views
- •6.5.3 Bringing Back Deep Constness to Views
- •6.6 Summary of All Container Idioms Broken By Views
- •6.7 Afternotes
- •7 Utilities for Ranges and Views
- •7.1 Key Utilities for Using Ranges as Views
- •7.1.1 std::views::all()
- •7.1.2 std::views::counted()
- •7.1.3 std::views::common()
- •7.2 New Iterator Categories
- •7.3 New Iterator and Sentinel Types
- •7.3.1 std::counted_iterator
- •7.3.2 std::common_iterator
- •7.3.3 std::default_sentinel
- •7.3.4 std::unreachable_sentinel
- •7.3.5 std::move_sentinel
- •7.4 New Functions for Dealing with Ranges
- •7.4.1 Functions for Dealing with the Elements of Ranges (and Arrays)
- •7.4.2 Functions for Dealing with Iterators
- •7.4.3 Functions for Swapping and Moving Elements/Values
- •7.4.4 Functions for Comparisons of Values
- •7.5 New Type Functions/Utilities for Dealing with Ranges
- •7.5.1 Generic Types of Ranges
- •7.5.2 Generic Types of Iterators
- •7.5.3 New Functional Types
- •7.5.4 Other New Types for Dealing with Iterators
- •7.6 Range Algorithms
- •7.6.1 Benefits and Restrictions for Range Algorithms
- •7.6.2 Algorithm Overview
- •8 View Types in Detail
- •8.1 Overview of All Views
- •8.1.1 Overview of Wrapping and Generating Views
- •8.1.2 Overview of Adapting Views
- •8.2 Base Class and Namespace of Views
- •8.2.1 Base Class for Views
- •8.2.2 Why Range Adaptors/Factories Have Their Own Namespace
- •8.3 Source Views to External Elements
- •8.3.1 Subrange
- •8.3.2 Ref View
- •8.3.3 Owning View
- •8.3.4 Common View
- •8.4 Generating Views
- •8.4.1 Iota View
- •8.4.2 Single View
- •8.4.3 Empty View
- •8.4.4 IStream View
- •8.4.5 String View
- •8.4.6 Span
- •8.5 Filtering Views
- •8.5.1 Take View
- •8.5.2 Take-While View
- •8.5.3 Drop View
- •8.5.4 Drop-While View
- •8.5.5 Filter View
- •8.6 Transforming Views
- •8.6.1 Transform View
- •8.6.2 Elements View
- •8.6.3 Keys and Values View
- •8.7 Mutating Views
- •8.7.1 Reverse View
- •8.8 Views for Multiple Ranges
- •8.8.1 Split and Lazy-Split View
- •8.8.2 Join View
- •9 Spans
- •9.1 Using Spans
- •9.1.1 Fixed and Dynamic Extent
- •9.1.2 Example Using a Span with a Dynamic Extent
- •9.1.4 Example Using a Span with Fixed Extent
- •9.1.5 Fixed vs. Dynamic Extent
- •9.2 Spans Considered Harmful
- •9.3 Design Aspects of Spans
- •9.3.1 Lifetime Dependencies of Spans
- •9.3.2 Performance of Spans
- •9.3.3 const Correctness of Spans
- •9.3.4 Using Spans as Parameters in Generic Code
- •9.4 Span Operations
- •9.4.1 Span Operations and Member Types Overview
- •9.4.2 Constructors
- •9.5 Afternotes
- •10 Formatted Output
- •10.1 Formatted Output by Example
- •10.1.1 Using std::format()
- •10.1.2 Using std::format_to_n()
- •10.1.3 Using std::format_to()
- •10.1.4 Using std::formatted_size()
- •10.2 Performance of the Formatting Library
- •10.2.1 Using std::vformat() and vformat_to()
- •10.3 Formatted Output in Detail
- •10.3.1 General Format of Format Strings
- •10.3.2 Standard Format Specifiers
- •10.3.3 Width, Precision, and Fill Characters
- •10.3.4 Format/Type Specifiers
- •10.4 Internationalization
- •10.5 Error Handling
- •10.6 User-Defined Formatted Output
- •10.6.1 Basic Formatter API
- •10.6.2 Improved Parsing
- •10.6.3 Using Standard Formatters for User-Defined Formatters
- •10.6.4 Using Standard Formatters for Strings
- •10.7 Afternotes
- •11 Dates and Timezones for <chrono>
- •11.1 Overview by Example
- •11.1.1 Scheduling a Meeting on the 5th of Every Month
- •11.1.2 Scheduling a Meeting on the Last Day of Every Month
- •11.1.3 Scheduling a Meeting Every First Monday
- •11.1.4 Using Different Timezones
- •11.2 Basic Chrono Concepts and Terminology
- •11.3 Basic Chrono Extensions with C++20
- •11.3.1 Duration Types
- •11.3.2 Clocks
- •11.3.3 Timepoint Types
- •11.3.4 Calendrical Types
- •11.3.5 Time Type hh_mm_ss
- •11.3.6 Hours Utilities
- •11.4 I/O with Chrono Types
- •11.4.1 Default Output Formats
- •11.4.2 Formatted Output
- •11.4.3 Locale-Dependent Output
- •11.4.4 Formatted Input
- •11.5 Using the Chrono Extensions in Practice
- •11.5.1 Invalid Dates
- •11.5.2 Dealing with months and years
- •11.5.3 Parsing Timepoints and Durations
- •11.6 Timezones
- •11.6.1 Characteristics of Timezones
- •11.6.2 The IANA Timezone Database
- •11.6.3 Using Timezones
- •11.6.4 Dealing with Timezone Abbreviations
- •11.6.5 Custom Timezones
- •11.7 Clocks in Detail
- •11.7.1 Clocks with a Specified Epoch
- •11.7.2 The Pseudo Clock local_t
- •11.7.3 Dealing with Leap Seconds
- •11.7.4 Conversions between Clocks
- •11.7.5 Dealing with the File Clock
- •11.8 Other New Chrono Features
- •11.9 Afternotes
- •12 std::jthread and Stop Tokens
- •12.1 Motivation for std::jthread
- •12.1.1 The Problem of std::thread
- •12.1.2 Using std::jthread
- •12.1.3 Stop Tokens and Stop Callbacks
- •12.1.4 Stop Tokens and Condition Variables
- •12.2 Stop Sources and Stop Tokens
- •12.2.1 Stop Sources and Stop Tokens in Detail
- •12.2.2 Using Stop Callbacks
- •12.2.3 Constraints and Guarantees of Stop Tokens
- •12.3.1 Using Stop Tokens with std::jthread
- •12.4 Afternotes
- •13 Concurrency Features
- •13.1 Thread Synchronization with Latches and Barriers
- •13.1.1 Latches
- •13.1.2 Barriers
- •13.2 Semaphores
- •13.2.1 Example of Using Counting Semaphores
- •13.2.2 Example of Using Binary Semaphores
- •13.3 Extensions for Atomic Types
- •13.3.1 Atomic References with std::atomic_ref<>
- •13.3.2 Atomic Shared Pointers
- •13.3.3 Atomic Floating-Point Types
- •13.3.4 Thread Synchronization with Atomic Types
- •13.3.5 Extensions for std::atomic_flag
- •13.4 Synchronized Output Streams
- •13.4.1 Motivation for Synchronized Output Streams
- •13.4.2 Using Synchronized Output Streams
- •13.4.3 Using Synchronized Output Streams for Files
- •13.4.4 Using Synchronized Output Streams as Output Streams
- •13.4.5 Synchronized Output Streams in Practice
- •13.5 Afternotes
- •14 Coroutines
- •14.1 What Are Coroutines?
- •14.2 A First Coroutine Example
- •14.2.1 Defining the Coroutine
- •14.2.2 Using the Coroutine
- •14.2.3 Lifetime Issues with Call-by-Reference
- •14.2.4 Coroutines Calling Coroutines
- •14.2.5 Implementing the Coroutine Interface
- •14.2.6 Bootstrapping Interface, Handle, and Promise
- •14.2.7 Memory Management
- •14.3 Coroutines That Yield or Return Values
- •14.3.1 Using co_yield
- •14.3.2 Using co_return
- •14.4 Coroutine Awaitables and Awaiters
- •14.4.1 Awaiters
- •14.4.2 Standard Awaiters
- •14.4.3 Resuming Sub-Coroutines
- •14.4.4 Passing Values From Suspension Back to the Coroutine
- •14.5 Afternotes
- •15 Coroutines in Detail
- •15.1 Coroutine Constraints
- •15.1.1 Coroutine Lambdas
- •15.2 The Coroutine Frame and the Promises
- •15.2.1 How Coroutine Interfaces, Promises, and Awaitables Interact
- •15.3 Coroutine Promises in Detail
- •15.3.1 Mandatory Promise Operations
- •15.3.2 Promise Operations to Return or Yield Values
- •15.3.3 Optional Promise Operations
- •15.4 Coroutine Handles in Detail
- •15.4.1 std::coroutine_handle<void>
- •15.5 Exceptions in Coroutines
- •15.6 Allocating Memory for the Coroutine Frame
- •15.6.1 How Coroutines Allocate Memory
- •15.6.2 Avoiding Heap Memory Allocation
- •15.6.3 get_return_object_on_allocation_failure()
- •15.7 co_await and Awaiters in Detail
- •15.7.1 Details of the Awaiter Interface
- •15.7.2 Letting co_await Update Running Coroutines
- •15.7.3 Symmetric Transfer with Awaiters for Continuation
- •15.8.1 await_transform()
- •15.8.2 operator co_await()
- •15.9 Concurrent Use of Coroutines
- •15.9.1 co_await Coroutines
- •15.9.2 A Thread Pool for Coroutine Tasks
- •15.9.3 What C++ Libraries Will Provide After C++20
- •15.10 Coroutine Traits
- •16 Modules
- •16.1 Motivation for Modules Using a First Example
- •16.1.1 Implementing and Exporting a Module
- •16.1.2 Compiling Module Units
- •16.1.3 Importing and Using a Module
- •16.1.4 Reachable versus Visible
- •16.1.5 Modules and Namespaces
- •16.2 Modules with Multiple Files
- •16.2.1 Module Units
- •16.2.2 Using Implementation Units
- •16.2.3 Internal Partitions
- •16.2.4 Interface Partitions
- •16.2.5 Summary of Splitting Modules into Different Files
- •16.3 Dealing with Modules in Practice
- •16.3.1 Dealing with Module Files with Different Compilers
- •16.3.2 Dealing with Header Files
- •16.4 Modules in Detail
- •16.4.1 Private Module Fragments
- •16.4.2 Module Declaration and Export in Detail
- •16.4.3 Umbrella Modules
- •16.4.4 Module Import in Detail
- •16.4.5 Reachable versus Visible Symbols in Detail
- •16.5 Afternotes
- •17 Lambda Extensions
- •17.1 Generic Lambdas with Template Parameters
- •17.1.1 Using Template Parameters for Generic Lambdas in Practice
- •17.1.2 Explicit Specification of Lambda Template Parameters
- •17.2 Calling the Default Constructor of Lambdas
- •17.4 consteval Lambdas
- •17.5 Changes for Capturing
- •17.5.1 Capturing this and *this
- •17.5.2 Capturing Structured Bindings
- •17.5.3 Capturing Parameter Packs of Variadic Templates
- •17.5.4 Lambdas as Coroutines
- •17.6 Afternotes
- •18 Compile-Time Computing
- •18.1 Keyword constinit
- •18.1.1 Using constinit in Practice
- •18.1.2 How constinit Solves the Static Initialization Order Fiasco
- •18.2.1 A First consteval Example
- •18.2.2 constexpr versus consteval
- •18.2.3 Using consteval in Practice
- •18.2.4 Compile-Time Value versus Compile-Time Context
- •18.4 std::is_constant_evaluated()
- •18.4.1 std::is_constant_evaluated() in Detail
- •18.5 Using Heap Memory, Vectors, and Strings at Compile Time
- •18.5.1 Using Vectors at Compile Time
- •18.5.2 Returning a Collection at Compile Time
- •18.5.3 Using Strings at Compile Time
- •18.6.1 constexpr Language Extensions
- •18.6.2 constexpr Library Extensions
- •18.7 Afternotes
- •19.1 New Types for Non-Type Template Parameters
- •19.1.1 Floating-Point Values as Non-Type Template Parameters
- •19.1.2 Objects as Non-Type Template Parameters
- •19.2 Afternotes
- •20 New Type Traits
- •20.1 New Type Traits for Type Classification
- •20.1.1 is_bounded_array_v<> and is_unbounded_array_v
- •20.2 New Type Traits for Type Inspection
- •20.2.1 is_nothrow_convertible_v<>
- •20.3 New Type Traits for Type Conversion
- •20.3.1 remove_cvref_t<>
- •20.3.2 unwrap_reference<> and unwrap_ref_decay_t
- •20.3.3 common_reference<>_t
- •20.3.4 type_identity_t<>
- •20.4 New Type Traits for Iterators
- •20.4.1 iter_difference_t<>
- •20.4.2 iter_value_t<>
- •20.4.3 iter_reference_t<> and iter_rvalue_reference_t<>
- •20.5 Type Traits and Functions for Layout Compatibility
- •20.5.1 is_layout_compatible_v<>
- •20.5.2 is_pointer_interconvertible_base_of_v<>
- •20.5.3 is_corresponding_member()
- •20.5.4 is_pointer_interconvertible_with_class()
- •20.6 Afternotes
- •21 Small Improvements for the Core Language
- •21.1 Range-Based for Loop with Initialization
- •21.2 using for Enumeration Values
- •21.3 Delegating Enumeration Types to Different Scopes
- •21.4 New Character Type char8_t
- •21.4.2 Broken Backward Compatibility
- •21.5 Improvements for Aggregates
- •21.5.1 Designated Initializers
- •21.5.2 Aggregate Initialization with Parentheses
- •21.5.3 Definition of Aggregates
- •21.6 New Attributes and Attribute Features
- •21.6.1 Attributes [[likely]] and [[unlikely]]
- •21.6.2 Attribute [[no_unique_address]]
- •21.6.3 Attribute [[nodiscard]] with Parameter
- •21.7 Feature Test Macros
- •21.8 Afternotes
- •22 Small Improvements for Generic Programming
- •22.1 Implicit typename for Type Members of Template Parameters
- •22.1.1 Rules for Implicit typename in Detail
- •22.2 Improvements for Aggregates in Generic Code
- •22.2.1 Class Template Argument Deduction (CTAD) for Aggregates
- •22.3.1 Conditional explicit in the Standard Library
- •22.4 Afternotes
- •23 Small Improvements for the C++ Standard Library
- •23.1 Updates for String Types
- •23.1.1 String Members starts_with() and ends_with()
- •23.1.2 Restricted String Member reserve()
- •23.2 std::source_location
- •23.3 Safe Comparisons of Integral Values and Sizes
- •23.3.1 Safe Comparisons of Integral Values
- •23.3.2 ssize()
- •23.4 Mathematical Constants
- •23.5 Utilities for Dealing with Bits
- •23.5.1 Bit Operations
- •23.5.2 std::bit_cast<>()
- •23.5.3 std::endian
- •23.6 <version>
- •23.7 Extensions for Algorithms
- •23.7.1 Range Support
- •23.7.2 New Algorithms
- •23.7.3 unseq Execution Policy for Algorithms
- •23.8 Afternotes
- •24 Deprecated and Removed Features
- •24.1 Deprecated and Removed Core Language Features
- •24.2 Deprecated and Removed Library Features
- •24.2.1 Deprecated Library Features
- •24.2.2 Removed Library Features
- •24.3 Afternotes
- •Glossary
- •Index
21.4 New Character Type char8_t |
655 |
foo(MyScope::done); // OK: calls MyProject::foo() with MyProject::Task::Status::done
Note that a type alias is not generally used by ADL: |
|
namespace MyScope { |
|
void bar(MyProject::Task::Status) { |
|
} |
|
using MyProject::Task::Status; |
// expose enum type to MyScope |
using enum MyProject::Task::Status; |
// expose the enum values to MyScope |
} |
|
MyScope::Status s = MyScope::open; |
// OK |
bar(MyScope::done); |
// ERROR |
MyScope::bar(MyScope::done); |
// OK |
21.4 New Character Type char8_t
For better UTF-8 support, C++20 introduces the new character type char8_t and a new corresponding string type std::u8string.
Type char8_t is a new keyword. It serves as the type for holding UTF-8 characters and character sequences. For example:
char8_t c = u8'@'; |
|
// character with UTF-8 encoding for character @ |
const char8_t* s = |
u8"K\u00F6ln"; |
// character sequence with UTF-8 encoding for K¨oln |
The introduction of this type is a breaking change:
•u8 character literals now use char8_t instead of char
•For UTF-8 strings, the new types std::u8string and std::u8string_view are used now Consider the following example:
auto c = u8'@'; |
// character with UTF-8 encoding for @ |
|
auto s1 = u8"K\u00F6ln"; |
// character sequence with UTF-8 encoding for K¨oln |
|
using namespace std::literals; |
|
|
auto s2 = u8"K\u00F6ln"s; |
// string with UTF-8 encoding for K¨oln |
|
auto sv = u8"K\u00F6ln"sv; |
// string view with UTF-8 encoding for K¨oln |
|
The types of c and s have changed here: |
|
|
• Before C++20, this was equivalent to: |
|
|
char c = u8'@'; |
|
// UTF-8 character type before C++20 |
const char* s1 = u8"K\u00F6ln"; |
|
// UTF-8 character sequence type before C++20 |
using namespace std::literals; |
|
|
std::string s2 = u8"K\u00F6ln"s; |
|
// UTF-8 string type before C++20 |
std::string_view sv = u8"K\u00F6ln"s; |
// UTF-8 string view type before C++20 |
|
• Since C++20, this is equivalent to: |
|
|
char8_t c = u8'@'; |
|
// UTF-8 character type since C++20 |
const char8_t* s1 = u8"K\u00F6ln"; |
// UTF-8 character sequence type since C++20 |
656 |
Chapter 21: Small Improvements for the Core Language |
|
using namespace std::literals; |
|
|
std::u8string s2 = |
u8"K\u00F6ln"s; |
// UTF-8 string type since C++20 |
std::u8string_view |
sv = u8"K\u00F6ln"sv; // UTF-8 string view type since C++20 |
The reason for this change is simple: we can now implement special behavior for UTF-8 characters and strings:
• We can overload for UTF-8 character sequences:
void store(const char* s)
{
storeInFile(convertToUTF8(s)); |
// store after converting to UTF-8 encoding |
} |
|
void store(const char8_t* s) |
|
{ |
|
storeInFile(s); |
// store as is because it already has UTF-8 encoding |
}
• We can implement special behavior in generic code: void store(const CharT* s)
{
if constexpr(std::same_as<CharT, char8_t>) {
storeInFile(s); // store as is because it already has UTF-8 encoding
}
else {
storeInFile(convertToUTF8(s)); // store after converting to UTF-8 encoding
}
}
You can still only store UTF-8 characters of one byte in an object of type char8_t (remember that UTF-8 characters have a variable width):
•The character for the at sign, @, has the decimal value 64 (hexadecimal 0x40). Therefore, you can store its value in a char8_t and for this reason, the u8 character literal is well-defined:
char8_t c = u8'@'; |
// OK (c has value 64) |
•The character for the European currency Euro, e, consists of three code units: 226 130 172 (hexadecimal: 0xE2 0x82 0xAC). Therefore, you cannot store its value in a char8_t:
char8_t cEuro = u8'e'; |
// ERROR: invalid character literal |
Instead, you have to initialize a character sequence or UTF-8 string:
const char8_t* cEuro |
= |
u8"\u20AC"; |
// OK |
std::u8string sEuro |
= |
u8"\u20AC"; |
// OK |
Here, we use the Unicode notation to specify the value of the UTF-8 character, which creates an array of four const char8_t (including the trailing null character), which is then used to initialize cEuro and sEuro.
21.4 New Character Type char8_t |
657 |
If your compiler accepts the character e in your source code (which means it has to support a source file encoding with a character set such as UTF-8 or ISO-8859-15), you could even use this symbol in
literals directly:2 |
|
const char8_t* cEuro = u8"e"; |
// OK if valid character for the compiler |
std::u8string sEuro = u8"e"; |
// OK if valid character for the compiler |
However, because this source code is not portable, you should use Unicode characters (such as \u20AC for the e symbol).
Note that a few things in the C++ standard library have changed accordingly:
•Overloads for the new character type char8_t were added
•Functions that use or return UTF-8 strings use the new UTF-8 string types
In addition, note that char8_t is not guaranteed to have 8 bits. It is defined as using an unsigned char internally, which typically has 8 bits, but may have more. As usual, you can use std::numeric_limits<> to check for the number of bits:
std::cout << "char8_t has "
<<std::numeric_limits<char8_t>::digits << " bits\n";
21.4.1Changes in the C++ Standard Library for char8_t
The C++ standard library changed the following to support char8_t:
•u8string is now provided (defined as std::basic_string<char8_t>)
•u8string_view is now provided (defined as std::basic_string_view<char8_t>)
•std::numeric_limits<char8_t> is now defined
•std::char_traits<char8_t> is now defined
•Hash functions for char8_t strings and string views are now provided
•std::mbrtoc8() and c8rtomb() are now provided
•codecvt facets for conversions between char8_t and char16_t or char32_t are now provided
•For filesystem paths, u8string() now returns std::u8string instead of std::string
•std::atomic<char8_t> is now provided
Note that these changes might break existing code when it is being compiled with C++20, which is discussed next.
21.4.2 Broken Backward Compatibility
Because C++20 changes the type of UTF-8 literals and signatures that return UTF-8 strings, code that uses UTF-8 characters might no longer compile.
2 Thanks to Tom Honermann for pointing this out.
658 |
Chapter 21: Small Improvements for the Core Language |
Broken Code Using the Character Type
For example:
std::string s0 = u8"text";
auto s = u8"K\u00F6ln"; const char* s2 = s; std::cout << s << '\n';
auto c = u8'c'; char c2 = c; char* cp = &c; std::cout << c;
//OK in C++17, ERROR since C++20
//s is const char* until C++17, but const char8_t* since C++20
//OK in C++17, ERROR since C++20
//OK in C++17, ERROR since C++20
//c1 is char in C++17, but char8_t since C++20
//OK (even if char8_t)
//OK in C++17, ERROR since C++20
//OK in C++17, ERROR since C++20
Broken Code Using the New String Types
In particular, code that returns UTF-8 strings might now lead to problems.
For example, the following code no longer compiles because u8string() now returns a std::u8string instead of a std::string:
// iterate over directory entries:
for (const auto& entry : fs::directory_iterator(path)) {
std::string name = entry.path().u8string(); |
// OK in C++17, ERROR since C++20 |
... |
|
} |
|
You have to adjust the code by using a different type, auto, or support both types:
// iterate over directory entries (C++17 and C++20):
for (const auto& entry : fs::directory_iterator(path)) { #ifdef __cpp_char8_t
std::u8string name = entry.path().u8string(); |
// OK since C++20 |
#else |
|
std::string name = entry.path().u8string(); |
// OK in C++17 |
#endif |
|
... |
|
} |
|
Broken Code I/O for UTF-8 Strings
You can also no longer print UTF-8 characters or strings to std::cout (or any other standard output stream):
std::cout << u8"text"; // OK in C++17, ERROR since C++20 std::cout << u8'X'; // OK in C++17, ERROR since C++20
In fact, C++20 deleted the output operator for all extended character types (wchar_t, char8_t, char16_t, char32_t) unless the output stream supports the same character type:
21.4 New Character Type char8_t |
659 |
std::cout << "text"; std::cout << L"text"; std::cout << u8"text"; std::cout << u"text"; std::cout << U"text";
std::wcout << "text"; std::wcout << L"text"; std::wcout << u8"text"; std::wcout << u"text"; std::wcout << U"text";
//OK
//wchar_t string: OOPS in C++17, ERROR since C++20
//UTF-8 string: OK in C++17, ERROR since C++20
//UTF-16 string: OOPS in C++17, ERROR since C++20
//UTF-32 string: OOPS in C++17, ERROR since C++20
//OK
//OK
//UTF-8 string: OK in C++17, ERROR since C++20
//UTF-16 string: OOPS in C++17, ERROR since C++20
//UTF-32 string: OOPS in C++17, ERROR since C++20
Note the statements marked with OOPS: they all compiled in C++17, but they printed the addresses of the strings instead of their values. Therefore, C++20 disables not only the output of UTF-8 characters, but also output that did not work at all.
Dealing with Broken Code
You now be wondering how to deal with code that formerly worked with UTF-8 characters. The easiest way to deal with it is to use reinterpret_cast<>:
auto s = u8"text"; |
// s is const char8_t* since C++20 |
std::cout << s; |
// ERROR since C++20 |
std::cout << reinterpret_cast<const char*>(s); |
// OK |
For a single character, using a static_cast is enough: |
|
auto c = u8'x'; |
// c is char8_t since C++20 |
std::cout << c; |
// ERROR since C++20 |
std::cout << static_cast<char>(c); |
// OK |
You can bind its use or the use of other workarounds on the feature test macro for the char8_t characters feature:3
auto s = u8"text"; |
// s is const char8_t* since C++20 |
#ifdef __cpp_char8_t |
|
std::cout << reinterpret_cast<const char *>(s); |
// OK |
#else |
|
std::cout << s; |
// OK in C++17, ERROR since C++20 |
#endif |
|
If you are wondering why C++20 does not provide a working output operator for UTF-8 characters, please note that this is a pretty complex issue that we ha simply not enough time to solve yet. You can read more about this here: http://stackoverflow.com/a/58895428.
Because using reinterpret_cast<> might not scale when UTF-8 characters are used heavily, Tom Honermann wrote a guideline on how to deal with code for UTF-8 characters before and since C++20: “P1423R3 char8_t Backward Compatibility Remediation.” If you deal with UTF-8 characters and strings, you should definitely read it. You can download it from http://wg21.link/p1423.
3 For the final behavior specified with C++20, __cpp_char8_t should have (at least) the value 201907.