LLVM 23.0.0git
llvm::ir2vec::Vocabulary Class Reference

Class for storing and accessing the IR2Vec vocabulary. More...

#include "llvm/Analysis/IR2Vec.h"

Public Types

enum class  CanonicalTypeID : unsigned {
  FloatTy , VoidTy , LabelTy , MetadataTy ,
  VectorTy , TokenTy , IntegerTy , ByteTy ,
  FunctionTy , PointerTy , StructTy , ArrayTy ,
  UnknownTy , MaxCanonicalType
}
 Canonical type IDs supported by IR2Vec Vocabulary. More...
enum class  OperandKind : unsigned {
  FunctionID , PointerID , ConstantID , VariableID ,
  MaxOperandKind
}
 Operand kinds supported by IR2Vec Vocabulary. More...
using const_iterator = VocabStorage::const_iterator
 Const Iterator type aliases.

Public Member Functions

 Vocabulary ()=default
 Vocabulary (VocabStorage &&Storage)
 Vocabulary (const Vocabulary &)=delete
Vocabularyoperator= (const Vocabulary &)=delete
 Vocabulary (Vocabulary &&)=default
Vocabularyoperator= (Vocabulary &&Other)=delete
bool isValid () const
unsigned getDimension () const
const ir2vec::Embeddingoperator[] (unsigned Opcode) const
 Accessors to get the embedding for a given entity.
const ir2vec::Embeddingoperator[] (Type::TypeID TypeID) const
const ir2vec::Embeddingoperator[] (const Value &Arg) const
const ir2vec::Embeddingoperator[] (CmpInst::Predicate P) const
const_iterator begin () const
const_iterator cbegin () const
const_iterator end () const
const_iterator cend () const
LLVM_ABI bool invalidate (Module &M, const PreservedAnalyses &PA, ModuleAnalysisManager::Invalidator &Inv) const

Static Public Member Functions

static LLVM_ABI Expected< VocabularyfromFile (StringRef VocabFilePath, float OpcWeight=1.0, float TypeWeight=0.5, float ArgWeight=0.2)
 Create a Vocabulary by loading embeddings from a JSON file.
static constexpr size_t getCanonicalSize ()
 Total number of entries (opcodes + canonicalized types + operand kinds + predicates)
static LLVM_ABI StringRef getVocabKeyForOpcode (unsigned Opcode)
 Function to get vocabulary key for a given Opcode.
static StringRef getVocabKeyForTypeID (Type::TypeID TypeID)
 Function to get vocabulary key for a given TypeID.
static StringRef getVocabKeyForOperandKind (OperandKind Kind)
 Function to get vocabulary key for a given OperandKind.
static LLVM_ABI OperandKind getOperandKind (const Value *Op)
 Function to classify an operand into OperandKind.
static LLVM_ABI StringRef getVocabKeyForPredicate (CmpInst::Predicate P)
 Function to get vocabulary key for a given predicate.
static unsigned getIndex (unsigned Opcode)
 Functions to return flat index.
static unsigned getIndex (Type::TypeID TypeID)
static unsigned getIndex (const Value &Op)
static unsigned getIndex (CmpInst::Predicate P)
static LLVM_ABI StringRef getStringKey (unsigned Pos)
 Returns the string key for a given index position in the vocabulary.
static LLVM_ABI VocabStorage createDummyVocabForTest (unsigned Dim=1)
 Create a dummy vocabulary for testing purposes.

Static Public Attributes

static constexpr unsigned MaxTypeIDs = Type::TypeID::TargetExtTyID + 1
static constexpr unsigned MaxCanonicalTypeIDs
static constexpr unsigned MaxOperandKinds
static constexpr unsigned MaxPredicateKinds

Friends

class llvm::IR2VecVocabAnalysis

Detailed Description

Class for storing and accessing the IR2Vec vocabulary.

The Vocabulary class manages seed embeddings for LLVM IR entities. The seed embeddings are the initial learned representations of the entities of LLVM IR. The IR2Vec representation for a given IR is derived from these seed embeddings.

The vocabulary contains the seed embeddings for three types of entities: instruction opcodes, types, and operands. Types are grouped/canonicalized for better learning (e.g., all float variants map to FloatTy). The vocabulary abstracts away the canonicalization effectively, the exposed APIs handle all the known LLVM IR opcodes, types and operands.

This class helps populate the seed embeddings in an internal vector-based ADT. It provides logic to map every IR entity to a specific slot index or position in this vector, enabling O(1) embedding lookup while avoiding unnecessary computations involving string based lookups while generating the embeddings.

Definition at line 248 of file IR2Vec.h.

Member Typedef Documentation

◆ const_iterator

Const Iterator type aliases.

Definition at line 424 of file IR2Vec.h.

Member Enumeration Documentation

◆ CanonicalTypeID

Canonical type IDs supported by IR2Vec Vocabulary.

Enumerator
FloatTy 
VoidTy 
LabelTy 
MetadataTy 
VectorTy 
TokenTy 
IntegerTy 
ByteTy 
FunctionTy 
PointerTy 
StructTy 
ArrayTy 
UnknownTy 
MaxCanonicalType 

Definition at line 287 of file IR2Vec.h.

◆ OperandKind

Operand kinds supported by IR2Vec Vocabulary.

Enumerator
FunctionID 
PointerID 
ConstantID 
VariableID 
MaxOperandKind 

Definition at line 305 of file IR2Vec.h.

Constructor & Destructor Documentation

◆ Vocabulary() [1/4]

llvm::ir2vec::Vocabulary::Vocabulary ( )
default

◆ Vocabulary() [2/4]

llvm::ir2vec::Vocabulary::Vocabulary ( VocabStorage && Storage)
inline

Definition at line 329 of file IR2Vec.h.

References llvm::move().

◆ Vocabulary() [3/4]

llvm::ir2vec::Vocabulary::Vocabulary ( const Vocabulary & )
delete

References Vocabulary().

◆ Vocabulary() [4/4]

llvm::ir2vec::Vocabulary::Vocabulary ( Vocabulary && )
default

References Vocabulary().

Member Function Documentation

◆ begin()

const_iterator llvm::ir2vec::Vocabulary::begin ( ) const
inline

Definition at line 426 of file IR2Vec.h.

References assert(), and isValid().

Referenced by cbegin().

◆ cbegin()

const_iterator llvm::ir2vec::Vocabulary::cbegin ( ) const
inline

Definition at line 431 of file IR2Vec.h.

References begin().

◆ cend()

const_iterator llvm::ir2vec::Vocabulary::cend ( ) const
inline

Definition at line 438 of file IR2Vec.h.

References end().

◆ createDummyVocabForTest()

VocabStorage Vocabulary::createDummyVocabForTest ( unsigned Dim = 1)
static

Create a dummy vocabulary for testing purposes.

Definition at line 433 of file IR2Vec.cpp.

References I, MaxCanonicalTypeIDs, MaxOperandKinds, MaxPredicateKinds, and llvm::ir2vec::VocabStorage::VocabStorage().

◆ end()

const_iterator llvm::ir2vec::Vocabulary::end ( ) const
inline

Definition at line 433 of file IR2Vec.h.

References assert(), and isValid().

Referenced by cend().

◆ fromFile()

Expected< Vocabulary > Vocabulary::fromFile ( StringRef VocabFilePath,
float OpcWeight = 1.0,
float TypeWeight = 0.5,
float ArgWeight = 0.2 )
static

Create a Vocabulary by loading embeddings from a JSON file.

This is the primary entry point for programmatic vocabulary creation, suitable for use in Python bindings or other contexts where command-line options are not available. Weights are applied to scale the embeddings for opcodes, types, and arguments respectively.

Definition at line 610 of file IR2Vec.cpp.

References llvm::ir2vec::ArgWeight, llvm::ir2vec::OpcWeight, llvm::ir2vec::TypeWeight, and Vocabulary().

Referenced by llvm::IR2VecVocabAnalysis::run().

◆ getCanonicalSize()

constexpr size_t llvm::ir2vec::Vocabulary::getCanonicalSize ( )
inlinestaticconstexpr

Total number of entries (opcodes + canonicalized types + operand kinds + predicates)

Definition at line 356 of file IR2Vec.h.

◆ getDimension()

unsigned llvm::ir2vec::Vocabulary::getDimension ( ) const
inline

Definition at line 349 of file IR2Vec.h.

References assert(), and isValid().

Referenced by llvm::FunctionPropertiesInfo::getFunctionPropertiesInfo().

◆ getIndex() [1/4]

unsigned llvm::ir2vec::Vocabulary::getIndex ( CmpInst::Predicate P)
inlinestatic

Definition at line 396 of file IR2Vec.h.

References P.

◆ getIndex() [2/4]

unsigned llvm::ir2vec::Vocabulary::getIndex ( const Value & Op)
inlinestatic

Definition at line 390 of file IR2Vec.h.

References assert(), getOperandKind(), and MaxOperandKinds.

◆ getIndex() [3/4]

unsigned llvm::ir2vec::Vocabulary::getIndex ( Type::TypeID TypeID)
inlinestatic

Definition at line 385 of file IR2Vec.h.

References assert(), and MaxTypeIDs.

◆ getIndex() [4/4]

unsigned llvm::ir2vec::Vocabulary::getIndex ( unsigned Opcode)
inlinestatic

Functions to return flat index.

Definition at line 380 of file IR2Vec.h.

References assert().

◆ getOperandKind()

Vocabulary::OperandKind Vocabulary::getOperandKind ( const Value * Op)
static

Function to classify an operand into OperandKind.

Definition at line 370 of file IR2Vec.cpp.

References ConstantID, FunctionID, llvm::isa(), PointerID, and VariableID.

Referenced by getIndex(), and operator[]().

◆ getStringKey()

StringRef Vocabulary::getStringKey ( unsigned Pos)
static

Returns the string key for a given index position in the vocabulary.

This is useful for debugging or printing the vocabulary. Do not use this for embedding generation as string based lookups are inefficient.

Definition at line 409 of file IR2Vec.cpp.

References assert(), getVocabKeyForOpcode(), getVocabKeyForOperandKind(), and getVocabKeyForPredicate().

◆ getVocabKeyForOpcode()

StringRef Vocabulary::getVocabKeyForOpcode ( unsigned Opcode)
static

Function to get vocabulary key for a given Opcode.

Definition at line 358 of file IR2Vec.cpp.

References assert().

Referenced by getStringKey().

◆ getVocabKeyForOperandKind()

StringRef llvm::ir2vec::Vocabulary::getVocabKeyForOperandKind ( OperandKind Kind)
inlinestatic

Function to get vocabulary key for a given OperandKind.

Definition at line 367 of file IR2Vec.h.

References assert(), and MaxOperandKinds.

Referenced by getStringKey().

◆ getVocabKeyForPredicate()

StringRef Vocabulary::getVocabKeyForPredicate ( CmpInst::Predicate P)
static

Function to get vocabulary key for a given predicate.

Definition at line 399 of file IR2Vec.cpp.

References llvm::CmpInst::FIRST_ICMP_PREDICATE, and llvm::CmpInst::getPredicateName().

Referenced by getStringKey().

◆ getVocabKeyForTypeID()

StringRef llvm::ir2vec::Vocabulary::getVocabKeyForTypeID ( Type::TypeID TypeID)
inlinestatic

Function to get vocabulary key for a given TypeID.

Definition at line 362 of file IR2Vec.h.

◆ invalidate()

bool Vocabulary::invalidate ( Module & M,
const PreservedAnalyses & PA,
ModuleAnalysisManager::Invalidator & Inv ) const

Definition at line 427 of file IR2Vec.cpp.

References llvm::PreservedAnalyses::getChecker(), and llvm::IR2VecVocabAnalysis.

◆ isValid()

bool llvm::ir2vec::Vocabulary::isValid ( ) const
inline

◆ operator=() [1/2]

Vocabulary & llvm::ir2vec::Vocabulary::operator= ( const Vocabulary & )
delete

References Vocabulary().

◆ operator=() [2/2]

Vocabulary & llvm::ir2vec::Vocabulary::operator= ( Vocabulary && Other)
delete

◆ operator[]() [1/4]

const ir2vec::Embedding & llvm::ir2vec::Vocabulary::operator[] ( CmpInst::Predicate P) const
inline

Definition at line 418 of file IR2Vec.h.

References P.

◆ operator[]() [2/4]

const ir2vec::Embedding & llvm::ir2vec::Vocabulary::operator[] ( const Value & Arg) const
inline

Definition at line 412 of file IR2Vec.h.

References assert(), getOperandKind(), and MaxOperandKinds.

◆ operator[]() [3/4]

const ir2vec::Embedding & llvm::ir2vec::Vocabulary::operator[] ( Type::TypeID TypeID) const
inline

Definition at line 406 of file IR2Vec.h.

References assert(), and MaxTypeIDs.

◆ operator[]() [4/4]

const ir2vec::Embedding & llvm::ir2vec::Vocabulary::operator[] ( unsigned Opcode) const
inline

Accessors to get the embedding for a given entity.

Definition at line 401 of file IR2Vec.h.

References assert().

◆ llvm::IR2VecVocabAnalysis

friend class llvm::IR2VecVocabAnalysis
friend

Definition at line 249 of file IR2Vec.h.

Referenced by invalidate().

Member Data Documentation

◆ MaxCanonicalTypeIDs

unsigned llvm::ir2vec::Vocabulary::MaxCanonicalTypeIDs
staticconstexpr
Initial value:

Definition at line 319 of file IR2Vec.h.

Referenced by createDummyVocabForTest().

◆ MaxOperandKinds

unsigned llvm::ir2vec::Vocabulary::MaxOperandKinds
staticconstexpr
Initial value:

Definition at line 321 of file IR2Vec.h.

Referenced by createDummyVocabForTest(), getIndex(), getVocabKeyForOperandKind(), and operator[]().

◆ MaxPredicateKinds

unsigned llvm::ir2vec::Vocabulary::MaxPredicateKinds
staticconstexpr
Initial value:
=
NumICmpPredicates + NumFCmpPredicates

Definition at line 325 of file IR2Vec.h.

Referenced by createDummyVocabForTest().

◆ MaxTypeIDs

unsigned llvm::ir2vec::Vocabulary::MaxTypeIDs = Type::TypeID::TargetExtTyID + 1
staticconstexpr

Definition at line 318 of file IR2Vec.h.

Referenced by getIndex(), and operator[]().


The documentation for this class was generated from the following files: