LLVM 23.0.0git
SampleProfReader.h
Go to the documentation of this file.
1//===- SampleProfReader.h - Read LLVM sample profile data -------*- C++ -*-===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains definitions needed for reading sample profiles.
10//
11// NOTE: If you are making changes to this file format, please remember
12// to document them in the Clang documentation at
13// tools/clang/docs/UsersManual.rst.
14//
15// Text format
16// -----------
17//
18// Sample profiles are written as ASCII text. The file is divided into
19// sections, which correspond to each of the functions executed at runtime.
20// Each section has the following format
21//
22// function1:total_samples:total_head_samples
23// offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ]
24// offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ]
25// ...
26// offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
27// offsetA[.discriminator]: fnA:num_of_total_samples
28// offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ]
29// ...
30// !CFGChecksum: num
31// !Attribute: flags
32//
33// This is a nested tree in which the indentation represents the nesting level
34// of the inline stack. There are no blank lines in the file. And the spacing
35// within a single line is fixed. Additional spaces will result in an error
36// while reading the file.
37//
38// Any line starting with the '#' character is completely ignored.
39//
40// Inlined calls are represented with indentation. The Inline stack is a
41// stack of source locations in which the top of the stack represents the
42// leaf function, and the bottom of the stack represents the actual
43// symbol to which the instruction belongs.
44//
45// Function names must be mangled in order for the profile loader to
46// match them in the current translation unit. The two numbers in the
47// function header specify how many total samples were accumulated in the
48// function (first number), and the total number of samples accumulated
49// in the prologue of the function (second number). This head sample
50// count provides an indicator of how frequently the function is invoked.
51//
52// There are three types of lines in the function body.
53//
54// * Sampled line represents the profile information of a source location.
55// * Callsite line represents the profile information of a callsite.
56// * Metadata line represents extra metadata of the function.
57//
58// Each sampled line may contain several items. Some are optional (marked
59// below):
60//
61// a. Source line offset. This number represents the line number
62// in the function where the sample was collected. The line number is
63// always relative to the line where symbol of the function is
64// defined. So, if the function has its header at line 280, the offset
65// 13 is at line 293 in the file.
66//
67// Note that this offset should never be a negative number. This could
68// happen in cases like macros. The debug machinery will register the
69// line number at the point of macro expansion. So, if the macro was
70// expanded in a line before the start of the function, the profile
71// converter should emit a 0 as the offset (this means that the optimizers
72// will not be able to associate a meaningful weight to the instructions
73// in the macro).
74//
75// b. [OPTIONAL] Discriminator. This is used if the sampled program
76// was compiled with DWARF discriminator support
77// (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).
78// DWARF discriminators are unsigned integer values that allow the
79// compiler to distinguish between multiple execution paths on the
80// same source line location.
81//
82// For example, consider the line of code ``if (cond) foo(); else bar();``.
83// If the predicate ``cond`` is true 80% of the time, then the edge
84// into function ``foo`` should be considered to be taken most of the
85// time. But both calls to ``foo`` and ``bar`` are at the same source
86// line, so a sample count at that line is not sufficient. The
87// compiler needs to know which part of that line is taken more
88// frequently.
89//
90// This is what discriminators provide. In this case, the calls to
91// ``foo`` and ``bar`` will be at the same line, but will have
92// different discriminator values. This allows the compiler to correctly
93// set edge weights into ``foo`` and ``bar``.
94//
95// c. Number of samples. This is an integer quantity representing the
96// number of samples collected by the profiler at this source
97// location.
98//
99// d. [OPTIONAL] Potential call targets and samples. If present, this
100// line contains a call instruction. This models both direct and
101// number of samples. For example,
102//
103// 130: 7 foo:3 bar:2 baz:7
104//
105// The above means that at relative line offset 130 there is a call
106// instruction that calls one of ``foo()``, ``bar()`` and ``baz()``,
107// with ``baz()`` being the relatively more frequently called target.
108//
109// Each callsite line may contain several items. Some are optional.
110//
111// a. Source line offset. This number represents the line number of the
112// callsite that is inlined in the profiled binary.
113//
114// b. [OPTIONAL] Discriminator. Same as the discriminator for sampled line.
115//
116// c. Number of samples. This is an integer quantity representing the
117// total number of samples collected for the inlined instance at this
118// callsite
119//
120// Metadata line can occur in lines with one indent only, containing extra
121// information for the top-level function. Furthermore, metadata can only
122// occur after all the body samples and callsite samples.
123// Each metadata line may contain a particular type of metadata, marked by
124// the starting characters annotated with !. We process each metadata line
125// independently, hence each metadata line has to form an independent piece
126// of information that does not require cross-line reference.
127// We support the following types of metadata:
128//
129// a. CFG Checksum (a.k.a. function hash):
130// !CFGChecksum: 12345
131// b. CFG Checksum (see ContextAttributeMask):
132// !Atribute: 1
133//
134//
135// Binary format
136// -------------
137//
138// This is a more compact encoding. Numbers are encoded as ULEB128 values
139// and all strings are encoded in a name table. The file is organized in
140// the following sections:
141//
142// MAGIC (uint64_t)
143// File identifier computed by function SPMagic() (0x5350524f463432ff)
144//
145// VERSION (uint32_t)
146// File format version number computed by SPVersion()
147//
148// SUMMARY
149// TOTAL_COUNT (uint64_t)
150// Total number of samples in the profile.
151// MAX_COUNT (uint64_t)
152// Maximum value of samples on a line.
153// MAX_FUNCTION_COUNT (uint64_t)
154// Maximum number of samples at function entry (head samples).
155// NUM_COUNTS (uint64_t)
156// Number of lines with samples.
157// NUM_FUNCTIONS (uint64_t)
158// Number of functions with samples.
159// NUM_DETAILED_SUMMARY_ENTRIES (size_t)
160// Number of entries in detailed summary
161// DETAILED_SUMMARY
162// A list of detailed summary entry. Each entry consists of
163// CUTOFF (uint32_t)
164// Required percentile of total sample count expressed as a fraction
165// multiplied by 1000000.
166// MIN_COUNT (uint64_t)
167// The minimum number of samples required to reach the target
168// CUTOFF.
169// NUM_COUNTS (uint64_t)
170// Number of samples to get to the desrired percentile.
171//
172// NAME TABLE
173// SIZE (uint64_t)
174// Number of entries in the name table.
175// NAMES
176// A NUL-separated list of SIZE strings.
177//
178// FUNCTION BODY (one for each uninlined function body present in the profile)
179// HEAD_SAMPLES (uint64_t) [only for top-level functions]
180// Total number of samples collected at the head (prologue) of the
181// function.
182// NOTE: This field should only be present for top-level functions
183// (i.e., not inlined into any caller). Inlined function calls
184// have no prologue, so they don't need this.
185// NAME_IDX (uint64_t)
186// Index into the name table indicating the function name.
187// SAMPLES (uint64_t)
188// Total number of samples collected in this function.
189// NRECS (uint32_t)
190// Total number of sampling records this function's profile.
191// BODY RECORDS
192// A list of NRECS entries. Each entry contains:
193// OFFSET (uint32_t)
194// Line offset from the start of the function.
195// DISCRIMINATOR (uint32_t)
196// Discriminator value (see description of discriminators
197// in the text format documentation above).
198// SAMPLES (uint64_t)
199// Number of samples collected at this location.
200// NUM_CALLS (uint32_t)
201// Number of non-inlined function calls made at this location. In the
202// case of direct calls, this number will always be 1. For indirect
203// calls (virtual functions and function pointers) this will
204// represent all the actual functions called at runtime.
205// CALL_TARGETS
206// A list of NUM_CALLS entries for each called function:
207// NAME_IDX (uint64_t)
208// Index into the name table with the callee name.
209// SAMPLES (uint64_t)
210// Number of samples collected at the call site.
211// NUM_INLINED_FUNCTIONS (uint32_t)
212// Number of callees inlined into this function.
213// INLINED FUNCTION RECORDS
214// A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
215// callees.
216// OFFSET (uint32_t)
217// Line offset from the start of the function.
218// DISCRIMINATOR (uint32_t)
219// Discriminator value (see description of discriminators
220// in the text format documentation above).
221// FUNCTION BODY
222// A FUNCTION BODY entry describing the inlined function.
223//===----------------------------------------------------------------------===//
224
225#ifndef LLVM_PROFILEDATA_SAMPLEPROFREADER_H
226#define LLVM_PROFILEDATA_SAMPLEPROFREADER_H
227
228#include "llvm/ADT/SmallVector.h"
229#include "llvm/ADT/StringRef.h"
231#include "llvm/IR/LLVMContext.h"
237#include "llvm/Support/Debug.h"
239#include "llvm/Support/ErrorOr.h"
241#include <cstdint>
242#include <list>
243#include <memory>
244#include <optional>
245#include <string>
246#include <system_error>
247#include <vector>
248
249namespace llvm {
250
251class raw_ostream;
252class Twine;
253
254namespace vfs {
255class FileSystem;
256} // namespace vfs
257
258namespace sampleprof {
259
261
262/// SampleProfileReaderItaniumRemapper remaps the profile data from a
263/// sample profile data reader, by applying a provided set of equivalences
264/// between components of the symbol names in the profile.
266public:
267 SampleProfileReaderItaniumRemapper(std::unique_ptr<MemoryBuffer> B,
268 std::unique_ptr<SymbolRemappingReader> SRR,
270 : Buffer(std::move(B)), Remappings(std::move(SRR)), Reader(R) {
271 assert(Remappings && "Remappings cannot be nullptr");
272 }
273
274 /// Create a remapper from the given remapping file. The remapper will
275 /// be used for profile read in by Reader.
278 LLVMContext &C);
279
280 /// Create a remapper from the given Buffer. The remapper will
281 /// be used for profile read in by Reader.
283 create(std::unique_ptr<MemoryBuffer> &B, SampleProfileReader &Reader,
284 LLVMContext &C);
285
286 /// Apply remappings to the profile read by Reader.
288
289 bool hasApplied() { return RemappingApplied; }
290
291 /// Insert function name into remapper.
292 void insert(StringRef FunctionName) { Remappings->insert(FunctionName); }
293
294 /// Query whether there is equivalent in the remapper which has been
295 /// inserted.
296 bool exist(StringRef FunctionName) {
297 return Remappings->lookup(FunctionName);
298 }
299
300 /// Return the equivalent name in the profile for \p FunctionName if
301 /// it exists.
302 LLVM_ABI std::optional<StringRef> lookUpNameInProfile(StringRef FunctionName);
303
304private:
305 // The buffer holding the content read from remapping file.
306 std::unique_ptr<MemoryBuffer> Buffer;
307 std::unique_ptr<SymbolRemappingReader> Remappings;
308 // Map remapping key to the name in the profile. By looking up the
309 // key in the remapper, a given new name can be mapped to the
310 // cannonical name using the NameMap.
312 // The Reader the remapper is servicing.
313 SampleProfileReader &Reader;
314 // Indicate whether remapping has been applied to the profile read
315 // by Reader -- by calling applyRemapping.
316 bool RemappingApplied = false;
317};
318
319/// Sample-based profile reader.
320///
321/// Each profile contains sample counts for all the functions
322/// executed. Inside each function, statements are annotated with the
323/// collected samples on all the instructions associated with that
324/// statement.
325///
326/// For this to produce meaningful data, the program needs to be
327/// compiled with some debug information (at minimum, line numbers:
328/// -gline-tables-only). Otherwise, it will be impossible to match IR
329/// instructions to the line numbers collected by the profiler.
330///
331/// From the profile file, we are interested in collecting the
332/// following information:
333///
334/// * A list of functions included in the profile (mangled names).
335///
336/// * For each function F:
337/// 1. The total number of samples collected in F.
338///
339/// 2. The samples collected at each line in F. To provide some
340/// protection against source code shuffling, line numbers should
341/// be relative to the start of the function.
342///
343/// The reader supports two file formats: text and binary. The text format
344/// is useful for debugging and testing, while the binary format is more
345/// compact and I/O efficient. They can both be used interchangeably.
346
347/// NameTableIterator is a lightweight, self-contained input iterator designed
348/// to stream FunctionId symbols from an eagerly populated contiguous buffer
349/// of FunctionId objects.
352 NameTableIterator, std::input_iterator_tag, FunctionId,
353 std::ptrdiff_t, const FunctionId *, FunctionId> {
354 const FunctionId *Ptr = nullptr;
355
356public:
357 NameTableIterator() = default;
358 NameTableIterator(const FunctionId *P) : Ptr(P) {}
359
360 bool operator==(const NameTableIterator &RHS) const { return Ptr == RHS.Ptr; }
361
363 ++Ptr;
364 return *this;
365 }
366
367 FunctionId operator*() const { return *Ptr; }
368};
369
371public:
372 SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
374 : Profiles(), Ctx(C), Buffer(std::move(B)), Format(Format) {}
375
376 virtual ~SampleProfileReader() = default;
377
378 /// Read and validate the file header.
379 virtual std::error_code readHeader() = 0;
380
381 /// Set the bits for FS discriminators. Parameter Pass specify the sequence
382 /// number, Pass == i is for the i-th round of adding FS discriminators.
383 /// Pass == 0 is for using base discriminators.
387
388 /// Get the bitmask the discriminators: For FS profiles, return the bit
389 /// mask for this pass. For non FS profiles, return (unsigned) -1.
391 if (!ProfileIsFS)
392 return 0xFFFFFFFF;
393 assert((MaskedBitFrom != 0) && "MaskedBitFrom is not set properly");
394 return getN1Bits(MaskedBitFrom);
395 }
396
397 /// The interface to read sample profiles from the associated file.
398 std::error_code read() {
399 if (std::error_code EC = readImpl())
400 return EC;
401 if (Remapper)
402 Remapper->applyRemapping(Ctx);
405 }
406
407 /// Read sample profiles for the given functions.
408 std::error_code read(const DenseSet<StringRef> &FuncsToUse) {
410 for (StringRef F : FuncsToUse)
411 if (Profiles.find(FunctionId(F)) == Profiles.end())
412 S.insert(F);
413 if (std::error_code EC = read(S, Profiles))
414 return EC;
416 }
417
418 /// The implementaion to read sample profiles from the associated file.
419 virtual std::error_code readImpl() = 0;
420
421 /// Print the profile for \p FunctionSamples on stream \p OS.
423 raw_ostream &OS = dbgs());
424
425 /// Collect functions with definitions in Module M. For reader which
426 /// support loading function profiles on demand, return true when the
427 /// reader has been given a module. Always return false for reader
428 /// which doesn't support loading function profiles on demand.
429 virtual bool collectFuncsFromModule() { return false; }
430
431 /// Print all the profiles on stream \p OS.
432 LLVM_ABI void dump(raw_ostream &OS = dbgs());
433
434 /// Print all the profiles on stream \p OS in the JSON format.
435 LLVM_ABI void dumpJson(raw_ostream &OS = dbgs());
436
437 /// Return the samples collected for function \p F.
439 // The function name may have been updated by adding suffix. Call
440 // a helper to (optionally) strip off suffixes so that we can
441 // match against the original function name in the profile.
443 return getSamplesFor(CanonName);
444 }
445
446 /// Return the samples collected for function \p F.
448 auto It = Profiles.find(FunctionId(Fname));
449 if (It != Profiles.end())
450 return &It->second;
451
453 auto R = FuncNameToProfNameMap->find(FunctionId(Fname));
454 if (R != FuncNameToProfNameMap->end()) {
455 Fname = R->second.stringRef();
456 auto It = Profiles.find(FunctionId(Fname));
457 if (It != Profiles.end())
458 return &It->second;
459 }
460 }
461
462 if (Remapper) {
463 if (auto NameInProfile = Remapper->lookUpNameInProfile(Fname)) {
464 auto It = Profiles.find(FunctionId(*NameInProfile));
465 if (It != Profiles.end())
466 return &It->second;
467 }
468 }
469 return nullptr;
470 }
471
472 /// Return all the profiles.
474
475 /// Report a parse error message.
476 void reportError(int64_t LineNumber, const Twine &Msg) const {
477 Ctx.diagnose(DiagnosticInfoSampleProfile(Buffer->getBufferIdentifier(),
478 LineNumber, Msg));
479 }
480
481 /// Create a sample profile reader appropriate to the file format.
482 /// Create a remapper underlying if RemapFilename is not empty.
483 /// Parameter P specifies the FSDiscriminatorPass.
487 StringRef RemapFilename = "");
488
489 /// Create a sample profile reader from the supplied memory buffer.
490 /// Create a remapper underlying if RemapFilename is not empty.
491 /// Parameter P specifies the FSDiscriminatorPass.
493 create(std::unique_ptr<MemoryBuffer> &B, LLVMContext &C, vfs::FileSystem &FS,
495 StringRef RemapFilename = "");
496
497 /// Return the profile summary.
498 ProfileSummary &getSummary() const { return *Summary; }
499
500 MemoryBuffer *getBuffer() const { return Buffer.get(); }
501
502 /// \brief Return the profile format.
504
505 /// Whether input profile is based on pseudo probes.
507
508 /// Whether input profile is fully context-sensitive.
509 bool profileIsCS() const { return ProfileIsCS; }
510
511 /// Whether input profile contains ShouldBeInlined contexts.
513
514 /// Whether input profile is flow-sensitive.
515 bool profileIsFS() const { return ProfileIsFS; }
516
517 virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
518 return nullptr;
519 };
520
521 /// It includes all the names that have samples either in outline instance
522 /// or inline instance.
526 virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };
527
528 /// Return whether names in the profile are all MD5 numbers.
529 bool useMD5() const { return ProfileIsMD5; }
530
531 /// Force the profile to use MD5 in Sample contexts, even if function names
532 /// are present.
533 virtual void setProfileUseMD5() { ProfileIsMD5 = true; }
534
535 /// Don't read profile without context if the flag is set.
536 void setSkipFlatProf(bool Skip) { SkipFlatProf = Skip; }
537
538 /// Return whether any name in the profile contains ".__uniq." suffix.
539 virtual bool hasUniqSuffix() { return false; }
540
542
543 void setModule(const Module *Mod) { M = Mod; }
544
549
550protected:
551 /// Map every function to its associated profile.
552 ///
553 /// The profile of every function executed at runtime is collected
554 /// in the structure FunctionSamples. This maps function objects
555 /// to their corresponding profiles.
557
558 /// LLVM context used to emit diagnostics.
560
561 /// Memory buffer holding the profile file.
562 std::unique_ptr<MemoryBuffer> Buffer;
563
564 /// Profile summary information.
565 std::unique_ptr<ProfileSummary> Summary;
566
567 /// Take ownership of the summary of this reader.
568 static std::unique_ptr<ProfileSummary>
570 return std::move(Reader.Summary);
571 }
572
573 /// Compute summary for this profile.
575
576 /// Read sample profiles for the given functions and write them to the given
577 /// profile map. Currently it's only used for extended binary format to load
578 /// the profiles on-demand.
579 virtual std::error_code read(const DenseSet<StringRef> &FuncsToUse,
582 }
583
584 std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;
585
586 // A map pointer to the FuncNameToProfNameMap in SampleProfileLoader,
587 // which maps the function name to the matched profile name. This is used
588 // for sample loader to look up profile using the new name.
591
592 // A map from a function's context hash to its meta data section range, used
593 // for on-demand read function profile metadata.
594 std::unordered_map<uint64_t, std::pair<const uint8_t *, const uint8_t *>>
596
597 std::pair<const uint8_t *, const uint8_t *> ProfileSecRange;
598
599 /// Whether the profile has attribute metadata.
601
602 /// \brief Whether samples are collected based on pseudo probes.
604
605 /// Whether function profiles are context-sensitive flat profiles.
606 bool ProfileIsCS = false;
607
608 /// Whether function profile contains ShouldBeInlined contexts.
610
611 /// Number of context-sensitive profiles.
613
614 /// Whether the function profiles use FS discriminators.
615 bool ProfileIsFS = false;
616
617 /// If true, the profile has vtable profiles and reader should decode them
618 /// to parse profiles correctly.
619 bool ReadVTableProf = false;
620
621 /// \brief The format of sample.
623
624 /// \brief The current module being compiled if SampleProfileReader
625 /// is used by compiler. If SampleProfileReader is used by other
626 /// tools which are not compiler, M is usually nullptr.
627 const Module *M = nullptr;
628
629 /// Zero out the discriminator bits higher than bit MaskedBitFrom (0 based).
630 /// The default is to keep all the bits.
632
633 /// Whether the profile uses MD5 for Sample Contexts and function names. This
634 /// can be one-way overriden by the user to force use MD5.
635 bool ProfileIsMD5 = false;
636
637 /// If SkipFlatProf is true, skip functions marked with !Flat in text mode or
638 /// sections with SecFlagFlat flag in ExtBinary mode.
639 bool SkipFlatProf = false;
640};
641
643public:
644 SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
646
647 /// Read and validate the file header.
648 std::error_code readHeader() override { return sampleprof_error::success; }
649
650 /// Read sample profiles from the associated file.
651 std::error_code readImpl() override;
652
653 /// Return true if \p Buffer is in the format supported by this class.
654 static bool hasFormat(const MemoryBuffer &Buffer);
655
656 /// Text format sample profile does not support MD5 for now.
657 void setProfileUseMD5() override {}
658
659private:
660 /// CSNameTable is used to save full context vectors. This serves as an
661 /// underlying immutable buffer for all clients.
662 std::list<SampleContextFrameVector> CSNameTable;
663};
664
666public:
670
671 /// Read and validate the file header.
672 std::error_code readHeader() override;
673
674 /// Read sample profiles from the associated file.
675 std::error_code readImpl() override;
676
677 /// It includes all the names that have samples either in outline instance
678 /// or inline instance.
683
684protected:
685 /// Read a numeric value of type T from the profile.
686 ///
687 /// If an error occurs during decoding, a diagnostic message is emitted and
688 /// EC is set.
689 ///
690 /// \returns the read value.
691 template <typename T> ErrorOr<T> readNumber();
692
693 /// Read a numeric value of type T from the profile. The value is saved
694 /// without encoded.
695 template <typename T> ErrorOr<T> readUnencodedNumber();
696
697 /// Read a string from the profile.
698 ///
699 /// If an error occurs during decoding, a diagnostic message is emitted and
700 /// EC is set.
701 ///
702 /// \returns the read value.
704
705 /// Read the string index and check whether it overflows the table.
706 template <typename T> inline ErrorOr<size_t> readStringIndex(T &Table);
707
708 /// Read the next function profile instance.
709 std::error_code readFuncProfile(const uint8_t *Start);
710 std::error_code readFuncProfile(const uint8_t *Start,
711 SampleProfileMap &Profiles);
712
713 /// Read the contents of the given profile instance.
714 std::error_code readProfile(FunctionSamples &FProfile);
715
716 /// Read the contents of Magic number and Version number.
717 std::error_code readMagicIdent();
718
719 /// Read profile summary.
720 std::error_code readSummary();
721
722 /// Read the whole name table.
723 std::error_code readNameTable();
724
725 /// Read a string indirectly via the name table. Optionally return the index.
726 ErrorOr<FunctionId> readStringFromTable(size_t *RetIdx = nullptr);
727
728 /// Read a context indirectly via the CSNameTable. Optionally return the
729 /// index.
730 ErrorOr<SampleContextFrames> readContextFromTable(size_t *RetIdx = nullptr);
731
732 /// Read a context indirectly via the CSNameTable if the profile has context,
733 /// otherwise same as readStringFromTable, also return its hash value.
734 ErrorOr<std::pair<SampleContext, uint64_t>> readSampleContextFromTable();
735
736 /// Read all virtual functions' vtable access counts for \p FProfile.
737 std::error_code readCallsiteVTableProf(FunctionSamples &FProfile);
738
739 /// Read bytes from the input buffer pointed by `Data` and decode them into
740 /// \p M. `Data` will be advanced to the end of the read bytes when this
741 /// function returns. Returns error if any.
742 std::error_code readVTableTypeCountMap(TypeCountMap &M);
743
744 /// Points to the current location in the buffer.
745 const uint8_t *Data = nullptr;
746
747 /// Points to the end of the buffer.
748 const uint8_t *End = nullptr;
749
750 /// Function name table.
751 std::vector<FunctionId> NameTable;
752
753 /// CSNameTable is used to save full context vectors. It is the backing buffer
754 /// for SampleContextFrames.
755 std::vector<SampleContextFrameVector> CSNameTable;
756
757 /// Table to cache MD5 values of sample contexts corresponding to
758 /// readSampleContextFromTable(), used to index into Profiles or
759 /// FuncOffsetTable.
760 std::vector<uint64_t> MD5SampleContextTable;
761
762 /// The starting address of the table of MD5 values of sample contexts. For
763 /// fixed length MD5 non-CS profile it is same as MD5NameMemStart because
764 /// hashes of non-CS contexts are already in the profile. Otherwise it points
765 /// to the start of MD5SampleContextTable.
767
768private:
769 std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
770 virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
771};
772
774private:
775 std::error_code verifySPMagic(uint64_t Magic) override;
776
777public:
781
782 /// \brief Return true if \p Buffer is in the format supported by this class.
783 static bool hasFormat(const MemoryBuffer &Buffer);
784};
785
786/// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase defines
787/// the basic structure of the extensible binary format.
788/// The format is organized in sections except the magic and version number
789/// at the beginning. There is a section table before all the sections, and
790/// each entry in the table describes the entry type, start, size and
791/// attributes. The format in each section is defined by the section itself.
792///
793/// It is easy to add a new section while maintaining the backward
794/// compatibility of the profile. Nothing extra needs to be done. If we want
795/// to extend an existing section, like add cache misses information in
796/// addition to the sample count in the profile body, we can add a new section
797/// with the extension and retire the existing section, and we could choose
798/// to keep the parser of the old section if we want the reader to be able
799/// to read both new and old format profile.
800///
801/// SampleProfileReaderExtBinary/SampleProfileWriterExtBinary define the
802/// commonly used sections of a profile in extensible binary format. It is
803/// possible to define other types of profile inherited from
804/// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase.
807private:
808 std::error_code decompressSection(const uint8_t *SecStart,
809 const uint64_t SecSize,
810 const uint8_t *&DecompressBuf,
811 uint64_t &DecompressBufSize);
812
813 BumpPtrAllocator Allocator;
814
815protected:
816 std::vector<SecHdrTableEntry> SecHdrTable;
817 std::error_code readSecHdrTableEntry(uint64_t Idx);
818 std::error_code readSecHdrTable();
819
820 std::error_code readFuncMetadata(bool ProfileHasAttribute,
822 std::error_code readFuncMetadata(bool ProfileHasAttribute);
823 std::error_code readFuncMetadata(bool ProfileHasAttribute,
824 FunctionSamples *FProfile);
825 std::error_code readFuncOffsetTable();
826 std::error_code readFuncProfiles();
827 std::error_code readFuncProfiles(const DenseSet<StringRef> &FuncsToUse,
829 std::error_code readNameTableSec(bool IsMD5, bool FixedLengthMD5);
830 std::error_code readCSNameTableSec();
831 std::error_code readProfileSymbolList();
832
833 std::error_code readHeader() override;
834 std::error_code verifySPMagic(uint64_t Magic) override = 0;
835 virtual std::error_code readOneSection(const uint8_t *Start, uint64_t Size,
836 const SecHdrTableEntry &Entry);
837 // placeholder for subclasses to dispatch their own section readers.
838 virtual std::error_code readCustomSection(const SecHdrTableEntry &Entry) = 0;
839
840 /// Determine which container readFuncOffsetTable() should populate, the list
841 /// FuncOffsetList or the map FuncOffsetTable.
842 bool useFuncOffsetList() const;
843
844 std::unique_ptr<ProfileSymbolList> ProfSymList;
845
846 /// The table mapping from a function context's MD5 to the offset of its
847 /// FunctionSample towards file start.
848 /// At most one of FuncOffsetTable and FuncOffsetList is populated.
850
851 /// The list version of FuncOffsetTable. This is used if every entry is
852 /// being accessed.
853 std::vector<std::pair<SampleContext, uint64_t>> FuncOffsetList;
854
855 /// The set containing the functions to use when compiling a module.
857
858public:
862
863 /// Read sample profiles in extensible format from the associated file.
864 std::error_code readImpl() override;
865
866 /// Get the total size of all \p Type sections.
867 uint64_t getSectionSize(SecType Type);
868 /// Get the total size of header and all sections.
869 uint64_t getFileSize();
870 bool dumpSectionInfo(raw_ostream &OS = dbgs()) override;
871
872 /// Collect functions with definitions in Module M. Return true if
873 /// the reader has been given a module.
874 bool collectFuncsFromModule() override;
875
876 std::unique_ptr<ProfileSymbolList> getProfileSymbolList() override {
877 return std::move(ProfSymList);
878 };
879
880private:
881 /// Read the profiles on-demand for the given functions. This is used after
882 /// stale call graph matching finds new functions whose profiles aren't loaded
883 /// at the beginning and we need to loaded the profiles explicitly for
884 /// potential matching.
885 std::error_code read(const DenseSet<StringRef> &FuncsToUse,
886 SampleProfileMap &Profiles) override;
887};
888
891private:
892 std::error_code verifySPMagic(uint64_t Magic) override;
893 std::error_code readCustomSection(const SecHdrTableEntry &Entry) override {
894 // Update the data reader pointer to the end of the section.
895 Data = End;
897 };
898
899public:
903
904 /// \brief Return true if \p Buffer is in the format supported by this class.
905 static bool hasFormat(const MemoryBuffer &Buffer);
906};
907
909
910// Supported histogram types in GCC. Currently, we only need support for
911// call target histograms.
922
924public:
925 SampleProfileReaderGCC(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
927 GcovBuffer(Buffer.get()) {}
928
929 /// Read and validate the file header.
930 std::error_code readHeader() override;
931
932 /// Read sample profiles from the associated file.
933 std::error_code readImpl() override;
934
935 /// Return true if \p Buffer is in the format supported by this class.
936 static bool hasFormat(const MemoryBuffer &Buffer);
937
938protected:
939 std::error_code readNameTable();
940 std::error_code readOneFunctionProfile(const InlineCallStack &InlineStack,
941 bool Update, uint32_t Offset);
942 std::error_code readFunctionProfiles();
943 std::error_code skipNextWord();
944 template <typename T> ErrorOr<T> readNumber();
946
947 /// Read the section tag and check that it's the same as \p Expected.
948 std::error_code readSectionTag(uint32_t Expected);
949
950 /// GCOV buffer containing the profile.
952
953 /// Function names in this profile.
954 std::vector<std::string> Names;
955
956 /// GCOV tags used to separate sections in the profile file.
957 static const uint32_t GCOVTagAFDOFileNames = 0xaa000000;
958 static const uint32_t GCOVTagAFDOFunction = 0xac000000;
959};
960
961} // end namespace sampleprof
962
963} // end namespace llvm
964
965#endif // LLVM_PROFILEDATA_SAMPLEPROFREADER_H
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
#define LLVM_ABI
Definition Compiler.h:213
Provides ErrorOr<T> smart pointer.
#define F(x, y, z)
Definition MD5.cpp:54
#define T
static Error readString(StringRef Buffer, const char *&Src, size_t MaxSize, StringRef &Val, Twine Desc)
Read a null-terminated string at the position Src from Buffer, with maximum byte size of MaxSize (inc...
static constexpr StringLiteral Filename
#define P(N)
This file defines the SmallVector class.
Value * RHS
Implements a dense probed hash-table based set.
Definition DenseSet.h:289
Diagnostic information for the sample profiler.
Represents either an error or a value T.
Definition ErrorOr.h:56
Tagged union holding either a T or a Error.
Definition Error.h:485
GCOVBuffer - A wrapper around MemoryBuffer to provide GCOV specific read operations.
Definition GCOV.h:74
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
This interface provides simple read-only access to a block of memory, and provides simple methods for...
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition Twine.h:82
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:46
std::pair< iterator, bool > insert(const ValueT &V)
Definition DenseSet.h:212
CRTP base class which implements the entire standard iterator facade in terms of a minimal subset of ...
Definition iterator.h:80
A range adaptor for a pair of iterators.
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
This class represents a function that is read from a sample profile.
Definition FunctionId.h:36
Representation of the samples collected for a function.
Definition SampleProf.h:783
static StringRef getCanonicalFnName(const Function &F)
Return the canonical name for a function, taking into account suffix elision policy attributes.
static LLVM_ABI bool UseMD5
Whether the profile uses MD5 to represent string.
This class is a wrapper to associative container MapT<KeyT, ValueT> using the hash value of the origi...
Definition HashKeyMap.h:52
Sample-based profile reader.
bool operator==(const NameTableIterator &RHS) const
This class provides operator overloads to the map container using MD5 as the key type,...
const uint8_t * Data
Points to the current location in the buffer.
llvm::iterator_range< NameTableIterator > getNameTable() const override
It includes all the names that have samples either in outline instance or inline instance.
std::vector< FunctionId > NameTable
Function name table.
const uint64_t * MD5SampleContextStart
The starting address of the table of MD5 values of sample contexts.
std::vector< SampleContextFrameVector > CSNameTable
CSNameTable is used to save full context vectors.
SampleProfileReaderBinary(std::unique_ptr< MemoryBuffer > B, LLVMContext &C, SampleProfileFormat Format=SPF_None)
std::vector< uint64_t > MD5SampleContextTable
Table to cache MD5 values of sample contexts corresponding to readSampleContextFromTable(),...
const uint8_t * End
Points to the end of the buffer.
virtual std::error_code readCustomSection(const SecHdrTableEntry &Entry)=0
std::vector< std::pair< SampleContext, uint64_t > > FuncOffsetList
The list version of FuncOffsetTable.
DenseSet< StringRef > FuncsToUse
The set containing the functions to use when compiling a module.
std::unique_ptr< ProfileSymbolList > ProfSymList
bool useFuncOffsetList() const
Determine which container readFuncOffsetTable() should populate, the list FuncOffsetList or the map F...
std::error_code readNameTableSec(bool IsMD5, bool FixedLengthMD5)
std::unique_ptr< ProfileSymbolList > getProfileSymbolList() override
std::error_code readFuncMetadata(bool ProfileHasAttribute, DenseSet< FunctionSamples * > &Profiles)
virtual std::error_code readOneSection(const uint8_t *Start, uint64_t Size, const SecHdrTableEntry &Entry)
std::error_code verifySPMagic(uint64_t Magic) override=0
SampleProfileReaderExtBinaryBase(std::unique_ptr< MemoryBuffer > B, LLVMContext &C, SampleProfileFormat Format)
DenseMap< hash_code, uint64_t > FuncOffsetTable
The table mapping from a function context's MD5 to the offset of its FunctionSample towards file star...
std::error_code readHeader() override
Read and validate the file header.
SampleProfileReaderExtBinary(std::unique_ptr< MemoryBuffer > B, LLVMContext &C, SampleProfileFormat Format=SPF_Ext_Binary)
GCOVBuffer GcovBuffer
GCOV buffer containing the profile.
std::vector< std::string > Names
Function names in this profile.
SampleProfileReaderGCC(std::unique_ptr< MemoryBuffer > B, LLVMContext &C)
static const uint32_t GCOVTagAFDOFileNames
GCOV tags used to separate sections in the profile file.
SampleProfileReaderItaniumRemapper remaps the profile data from a sample profile data reader,...
bool exist(StringRef FunctionName)
Query whether there is equivalent in the remapper which has been inserted.
static LLVM_ABI ErrorOr< std::unique_ptr< SampleProfileReaderItaniumRemapper > > create(StringRef Filename, vfs::FileSystem &FS, SampleProfileReader &Reader, LLVMContext &C)
Create a remapper from the given remapping file.
LLVM_ABI void applyRemapping(LLVMContext &Ctx)
Apply remappings to the profile read by Reader.
SampleProfileReaderItaniumRemapper(std::unique_ptr< MemoryBuffer > B, std::unique_ptr< SymbolRemappingReader > SRR, SampleProfileReader &R)
void insert(StringRef FunctionName)
Insert function name into remapper.
LLVM_ABI std::optional< StringRef > lookUpNameInProfile(StringRef FunctionName)
Return the equivalent name in the profile for FunctionName if it exists.
SampleProfileReaderRawBinary(std::unique_ptr< MemoryBuffer > B, LLVMContext &C, SampleProfileFormat Format=SPF_Binary)
SampleProfileReaderText(std::unique_ptr< MemoryBuffer > B, LLVMContext &C)
void setProfileUseMD5() override
Text format sample profile does not support MD5 for now.
std::error_code readHeader() override
Read and validate the file header.
uint32_t MaskedBitFrom
Zero out the discriminator bits higher than bit MaskedBitFrom (0 based).
std::pair< const uint8_t *, const uint8_t * > ProfileSecRange
bool ReadVTableProf
If true, the profile has vtable profiles and reader should decode them to parse profiles correctly.
bool ProfileIsPreInlined
Whether function profile contains ShouldBeInlined contexts.
std::unordered_map< uint64_t, std::pair< const uint8_t *, const uint8_t * > > FuncMetadataIndex
SampleProfileMap & getProfiles()
Return all the profiles.
uint32_t CSProfileCount
Number of context-sensitive profiles.
static LLVM_ABI ErrorOr< std::unique_ptr< SampleProfileReader > > create(StringRef Filename, LLVMContext &C, vfs::FileSystem &FS, FSDiscriminatorPass P=FSDiscriminatorPass::Base, StringRef RemapFilename="")
Create a sample profile reader appropriate to the file format.
bool profileIsProbeBased() const
Whether input profile is based on pseudo probes.
FunctionSamples * getSamplesFor(const Function &F)
Return the samples collected for function F.
LLVM_ABI void dump(raw_ostream &OS=dbgs())
Print all the profiles on stream OS.
bool useMD5() const
Return whether names in the profile are all MD5 numbers.
const Module * M
The current module being compiled if SampleProfileReader is used by compiler.
std::unique_ptr< MemoryBuffer > Buffer
Memory buffer holding the profile file.
std::unique_ptr< SampleProfileReaderItaniumRemapper > Remapper
bool ProfileHasAttribute
Whether the profile has attribute metadata.
bool SkipFlatProf
If SkipFlatProf is true, skip functions marked with !Flat in text mode or sections with SecFlagFlat f...
bool profileIsPreInlined() const
Whether input profile contains ShouldBeInlined contexts.
std::error_code read()
The interface to read sample profiles from the associated file.
bool profileIsFS() const
Whether input profile is flow-sensitive.
const HashKeyMap< std::unordered_map, FunctionId, FunctionId > * FuncNameToProfNameMap
SampleProfileReaderItaniumRemapper * getRemapper()
bool ProfileIsCS
Whether function profiles are context-sensitive flat profiles.
std::error_code read(const DenseSet< StringRef > &FuncsToUse)
Read sample profiles for the given functions.
bool ProfileIsMD5
Whether the profile uses MD5 for Sample Contexts and function names.
virtual llvm::iterator_range< NameTableIterator > getNameTable() const
It includes all the names that have samples either in outline instance or inline instance.
static std::unique_ptr< ProfileSummary > takeSummary(SampleProfileReader &Reader)
Take ownership of the summary of this reader.
ProfileSummary & getSummary() const
Return the profile summary.
SampleProfileFormat Format
The format of sample.
SampleProfileReader(std::unique_ptr< MemoryBuffer > B, LLVMContext &C, SampleProfileFormat Format=SPF_None)
std::unique_ptr< ProfileSummary > Summary
Profile summary information.
virtual bool hasUniqSuffix()
Return whether any name in the profile contains ".__uniq." suffix.
LLVM_ABI void computeSummary()
Compute summary for this profile.
uint32_t getDiscriminatorMask() const
Get the bitmask the discriminators: For FS profiles, return the bit mask for this pass.
virtual bool dumpSectionInfo(raw_ostream &OS=dbgs())
SampleProfileFormat getFormat() const
Return the profile format.
virtual void setProfileUseMD5()
Force the profile to use MD5 in Sample contexts, even if function names are present.
void setDiscriminatorMaskedBitFrom(FSDiscriminatorPass P)
Set the bits for FS discriminators.
virtual std::error_code read(const DenseSet< StringRef > &FuncsToUse, SampleProfileMap &Profiles)
Read sample profiles for the given functions and write them to the given profile map.
void setFuncNameToProfNameMap(const HashKeyMap< std::unordered_map, FunctionId, FunctionId > &FPMap)
bool profileIsCS() const
Whether input profile is fully context-sensitive.
bool ProfileIsFS
Whether the function profiles use FS discriminators.
virtual bool collectFuncsFromModule()
Collect functions with definitions in Module M.
FunctionSamples * getSamplesFor(StringRef Fname)
Return the samples collected for function F.
LLVM_ABI void dumpJson(raw_ostream &OS=dbgs())
Print all the profiles on stream OS in the JSON format.
SampleProfileMap Profiles
Map every function to its associated profile.
virtual std::error_code readHeader()=0
Read and validate the file header.
void setSkipFlatProf(bool Skip)
Don't read profile without context if the flag is set.
LLVM_ABI void dumpFunctionProfile(const FunctionSamples &FS, raw_ostream &OS=dbgs())
Print the profile for FunctionSamples on stream OS.
bool ProfileIsProbeBased
Whether samples are collected based on pseudo probes.
void reportError(int64_t LineNumber, const Twine &Msg) const
Report a parse error message.
virtual std::unique_ptr< ProfileSymbolList > getProfileSymbolList()
LLVMContext & Ctx
LLVM context used to emit diagnostics.
virtual std::error_code readImpl()=0
The implementaion to read sample profiles from the associated file.
The virtual file system interface.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
std::map< FunctionId, uint64_t > TypeCountMap
Key represents type of a C++ polymorphic class type by its vtable and value represents its counter.
Definition SampleProf.h:330
SmallVector< FunctionSamples *, 10 > InlineCallStack
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:558
static unsigned getFSPassBitEnd(sampleprof::FSDiscriminatorPass P)
decltype(auto) get(const PointerIntPair< PointerTy, IntBits, IntType, PtrTraits, Info > &Pair)
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
@ Mod
The access may modify the value stored in memory.
Definition ModRef.h:34
static unsigned getN1Bits(int N)
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1916
BumpPtrAllocatorImpl<> BumpPtrAllocator
The standard BumpPtrAllocator which just uses the default template parameters.
Definition Allocator.h:383
Implement std::hash so that hash_code can be used in STL containers.
Definition BitVector.h:861