Python API

HUPO-PSI Spectral library format.

Spectrum Libraries

class mzspeclib.SpectrumLibrary(identifier=None, filename=None, format=None, index_type=None, create_index=True)

Bases: object

Read, write, and search through a spectrum library.

This type will attempt to infer the correct spectrum library reader from the input file, but may need to be explicitly prompted if there are ambiguities.

identifier

A unique identifier string assigned to this library by a spectral library host or provider.

Type:

str

filename

A location on the local file system where the spectral library is stored

Type:

str or Path

format

The name of the format for the current encoding of the library.

Type:

str

backend

The implementation used to parse the file

Type:

SpectralLibraryBackendBase

Parameters:

identifier (str, optional) – A universal identifier for a hosted spectral library to fetch.

filename (str, os.PathLike, or io.IOBase, optional) – A path-like or file-like object that holds a spectral library to read.

format (string) – Name of the format for the current encoding of the library.

index_type (Type[IndexBase]) – The type of index to preferentially construct.

create_index (bool) – Whether to construct an index over the library if one does not exist already. This limits the library to sequential iteration but does not incur an expensive end-to-end parsing step upon opening.

classmethod from_backend(backend)

Wrap a format-specific backend in a SpectrumLibrary.

This is useful because the respective backends do not support all operations.

Parameters:

backend (SpectralLibraryBackendBase)

Return type:

SpectrumLibrary

property spectrum_attribute_sets

The spectrum attribute sets of the spectral library

property analyte_attribute_sets

The analyte attribute sets of the spectral library

property interpretation_attribute_sets

The interpretation attribute sets of the spectral library

property cluster_attribute_sets

The spectrum cluster attribute sets of the spectral library

property identifier: str | None

The spectrum library’s identifier, such an accession number provided by a host or repository.

This attribute may not exist for libraries loaded from files on disk. It relies on the presence of the MS:1003187|library identifier attribute in the library header.

property filename: str | None

The path to the library file on disk.

This may also be a file-like object.

property attributes: AttributeManager

The library level attributes

read_header()

Read just the header of the whole library

Returns:

Whether the operation was successful

Return type:

bool

read()

Create a sequential iterator over the spectrum library entries.

Yields:

Spectrum or SpectrumCluster

write(destination, format=None, **kwargs)

Write the library to disk.

Parameters:

destination (str, os.PathLike, or io.IOBase) – The path or stream to write the library to.

format (str, Type, or Callable) – The name of the format or a callable object that returns a SpectrumLibraryWriterBase.

**kwargs – Passed to implementation.

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the specturm in the library

spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

get_cluster(cluster_number)

Retrieve a single spectrum cluster from the library.

Parameters:

cluster_number (int, optional) – The index of the cluster in the library

Return type:

SpectrumCluster

find_spectra(specification, **query_keys)

Return a list of spectra given query constraints

Return type:

List[Spectrum]

add_attribute(key, value, group_identifier=None)

Add an attribute to the library level attributes store.

Parameters:

key (str) – The name of the attribute to add

value (object) – The value of the attribute to add

group_identifier (str, optional) – The attribute group identifier to use, if any. If not provided, no group is assumed.

get_attribute(key, group_identifier=None, raw=False)

Get the value or values associated with a given attribute key.

Parameters:

key (str) – The name of the attribute to retrieve

group_identifier (str, optional) – The specific group identifier to return from.

raw (bool) – Whether to return the Attribute object or unwrap the value

Returns:

attribute_value – Returns single or multiple values for the requested attribute.

Return type:

object or list[object]

remove_attribute(key, group_identifier=None)

Remove the value or values associated with a given attribute key from the library level attribute store.

This rebuilds the entire store, which may be expensive.

Parameters:

key (str) – The name of the attribute to retrieve

group_identifier (str, optional) – The specific group identifier to return from.

has_attribute(key)

Test for the presence of a given attribute in the library level store.

Parameters:

key (str) – The attribute to test for

Return type:

bool

summarize_parsing_errors()

Retrieve a free-form description of parsing errors

static supported_file_extensions()

Get the set of file extensions that are currently recognized as spectral library formats

Return type:

Set[str]

Library Spectra

class mzspeclib.Spectrum(attributes=None, peak_list=None, analytes=None, interpretations=None)

Bases: AttributeManager

Parameters:

attributes (list) – A list of attribute [key, value (, group)] sets to initialize to.

peak_list (list) – A list of tuples representing (annotated) peaks

analytes (dict[str, Analyte]) – A mapping from identifier to Analyte unique within this Spectrum.

interpretations (InterpretationCollection) – A mapping from identifier to Interpretation unique within this Spectrum.

property precursor_charge: int

Obtain the spectrum’s precursor ion charge or analyte charge

add_analyte(analyte)

Add an Analyte to the spectrum

Parameters:

analyte (Analyte)

get_analyte(analyte_id)

Get an Analyte by ID from the spectrum

Parameters:

analyte_id (str)

Return type:

Analyte

remove_analyte(analyte_id)

Remove an Analyte by ID from the spectrum

Parameters:

analyte_id (str)

write(format='text', **kwargs)

Write out the spectrum in any of the supported formats

Parameters:

format (str) – The name of the format to write in

**kwargs – Passed to implementation

Components of Spectra

class mzspeclib.Analyte(id, attributes=None)

Bases: IdentifiedAttributeManager

A molecule that is associated with an Interpretation

Parameters:

id (str)

attributes (List[Attribute])

property peptide: ProForma | None

Read out the peptide sequence of the analyte if it is present.

This probes the following attributes, in order:

MS:1003270|proforma peptidoform ion notation

MS:1000889|proforma peptidoform sequence

MS:1000888|stripped peptide sequence

Return type:

ProForma or None

property charge: int | None

Read the analyte’s charge state, if it is present.

This probes the following attributes in order:

MS:1000041|charge state

MS:1003270|proforma peptidoform ion notation

Return type:

int or None

class mzspeclib.InterpretationCollection(interpretations=None)

Bases: MutableMapping[str, Interpretation]

A mutable mapping for Interpretation`s that also exposes a shared pool of :class:`Analyte members.

Parameters:

interpretations (Dict[str, Interpretation])

get_interpretation(interpretation_id)

Get the interpretation given by the key interpretation_id

Return type:

Interpretation

add_interpretation(interpretation)

Add an Interpretation to the collection.

The key used will be Interpretation.id.

Parameters:

interpretation (Interpretation)

set_interpretation(key, interpretation)

Set the interpretation for key to interpretation

Parameters:

interpretation (Interpretation)

keys() → a set-like object providing a view on D's keys

Return type:

KeysView[str]

values() → an object providing a view on D's values

Return type:

ValuesView[Interpretation]

items() → a set-like object providing a view on D's items

Return type:

ItemsView[str, Interpretation]

property analytes: _AnalyteMappingProxy

A facade that exposes a Mapping[str, Analyte] interface

class mzspeclib.Interpretation(id, attributes=None, analytes=None, member_interpretations=None)

Bases: AttributedEntity, MutableMapping

An interpretation of a Spectrum with one or more Analyte members.

Parameters:

id (str)

attributes (AttributeManager)

analytes (Dict[str, Analyte])

member_interpretations (Dict[str, InterpretationMember])

id

The identifier for the interpretation

Type:

str

analytes

The analytes which are part of interpretations.

Type:

dict[str, Analyte]

member_interpretations

The interpretation details which are associated with specific Analyte members.

Type:

dict[str, InterpretationMember]

get_analyte(analyte_id)

Retrieve an analyte by its identifier

Return type:

Analyte

add_analyte(analyte)

Add an analyte to the interpretation

Parameters:

analyte (Analyte)

set_analyte(key, analyte)

Set the analyte for key to analyte

Parameters:

analyte (Analyte)

remove_analyte(analyte_id)

Remove the analyte for analyte_id

has_analyte(analyte_id)

Check if this interpretation includes analyte_id

Return type:

bool

get_member_interpretation(member_id)

Retrieve the InterpretationMember for the member_id

Return type:

InterpretationMember

add_member_interpretation(interpretation_member)

Add an InterpretationMember to the interpretation

Parameters:

interpretation_member (InterpretationMember)

remove_member_interpretation(member_id)

Remove the InterpretationMember for member_id

validate()

Perform validation on each component to confirm this object is well formed.

Return type:

bool

class mzspeclib.InterpretationMember(id, attributes=None)

Bases: IdentifiedAttributeManager

A collection of attributes associated with a specific Analyte contained in an Interpretation

Parameters:

id (str)

attributes (List[Attribute])

Controlled Vocabulary Attributes

Like many PSI file formats, MzSpecLib uses controlled vocabulary terms to represent properties of entities. Many places in mzspeclib use Attribute to evaluate spectra, but they are also used indirectly when interacting with a AttributeManager as a mapping for key-value pair retrieval.

class mzspeclib.attributes.AttributeManager(attributes=None)

Bases: object

A key-value pair store with optional grouping for storing controlled vocabulary-backed attributes.

The various components of this object shouldn’t be modified directly, and instead rely on the interface this type provides to access its data.

Parameters:: attributes (Iterable[list], optional) – Attribute name-value pairs with an optional grouping value. If omitted, the attribute store will be empty.

get_next_group_identifier()

Retrieve the next un-used attribute group identifier and increment the internal counter.

Return type:: str

add_attribute(key, value, group_identifier=None, owner_id=None)

Add an attribute to the list and update the lookup tables

Parameters:

key (str) – The name of the attribute to add
value (object) – The value of the attribute to add
group_identifier (str, optional) – The attribute group identifier to use, if any. If not provided, no group is assumed.
owner_id (Any | None)

add_attribute_group(attributes, owner_id=None)

Add a collection of connected attributes that are part of a single group

Parameters:

attributes (List[Attribute | Tuple[str, Any]])
owner_id (Any | None)

get_attribute(key, group_identifier=None, raw=False)

Get the value or values associated with a given attribute key.

Parameters:

key (str) – The name of the attribute to retrieve
group_identifier (str, optional) – The specific group identifier to return from.
raw (bool) – Whether to return the Attribute object or unwrap the value

Returns:

attribute_value – Returns single or multiple values for the requested attribute.

Return type:

object or list[object]

get_attribute_group(group_identifier)

Get all the members of a specified attribute group

Parameters:: group_identifier (str)
Return type:: List[Any]

get_by_name(name)

Search for an attribute by human-readable name.

Parameters:: name (str) – The name to search for.
Returns:: The attribute value if found or None.
Return type:: object

clear(): Remove all content from the store.

remove_attribute(key, group_identifier=None)

Remove the value or values associated with a given attribute key from the store.

This rebuilds the entire store, which may be expensive.

Parameters:

key (str) – The name of the attribute to retrieve
group_identifier (str, optional) – The specific group identifier to return from.

has_attribute(key)

Test for the presence of a given attribute

Parameters:: key (str) – The attribute to test for
Return type:: bool

copy(): Make a deep copy of the object

class mzspeclib.attributes.IdentifiedAttributeManager(id, attributes=None)

Bases: AttributeManager

An AttributeManager with an id attribute.

This serves as the base type for most intermediate layers in mzSpecLib.

Parameters:

id (str)
attributes (List[Attribute])

class mzspeclib.attributes.AttributedEntity(attributes=None, **kwargs)

Bases: _ReadAttributes, _WriteAttributes

A base type for entities which contain an AttributeManager without being completely subsumed by it.

An AttributeManager represents a collection of attributes first and foremost, supplying MutableMapping-like interface to them, in addition to methods.

Parameters:: attributes (AttributeManager)

class mzspeclib.attributes.AttributeManagedProperty(attribute, multiple=False)

Bases: Generic[T]

A property-like object that stores its value as a controlled vocabulary backed attribute on an Attributed-like object.

Parameters:

attribute (str)
multiple (bool)

attribute

The attribute which is used to read/write the value of the descriptor.

Type:: str

multiple

Whether the default assumption is that the attribute will be repeated. When True, a list is always returned.

Type:: bool

class mzspeclib.attributes.AttributeListManagedProperty(attributes)

Bases: Generic[T]

Like AttributeManagedProperty, except a list of attributes are tried in succession until one is found.

Parameters:: attributes (List[str])

attributes

The attributes to probe, in order

Type:: list[str]

class mzspeclib.attributes.AttributeProxyMeta(typename, bases, namespace)

Bases: type

A metaclass that manages controlled vocabulary-backed properties like AttributeManagedProperty or AttributeListManagedProperty.

This includes generating a meaningful __repr__() method for the instance types.

class mzspeclib.attributes.ROAttributeProxy(attributes)

Bases: _ReadAttributes

A read-only attribute proxy wraps another Attributed-like object and provides tailored interactions with all or a subset of those attributes.

Parameters:: attributes (AttributeManager | AttributedEntity)

mzspeclib.attributes.AttributeProxy: alias of ROAttributeProxy

class mzspeclib.attributes.AttributeSet(name, attributes=None, **kwargs)

Bases: AttributedEntity

Parameters:

name (str)
attributes (AttributeManager)

class mzspeclib.attributes.AttributeSetRef(attribute_set: mzspeclib.attributes.AttributeSet, group_id: str)

Bases: object

Parameters:

attribute_set (AttributeSet)
group_id (str)

class mzspeclib.attributes.AttributeFacet(facet_type)

Bases: Generic[T]

A descriptor that resembles an object, backed by one or more attributes from a AttributeManager-like type.

Parameters:: facet_type (Type[T])

class mzspeclib.attributes.AttributeGroupFacet(facet_type)

Bases: Generic[T]

Parameters:: facet_type (Type[T])

Indexing

Spectral Library Indexing

MzSpecLib recommends that each file format be indexed in a way that is appropriate to the format and the application, and recognizes that most formats will not have any kind of built-in index.

mzspeclib provides data structures for both in-memory and on-disk index storage.

The IndexBase type is an abstract base class for writing indices. It isn’t necessary unless you’re writing a new backend or index format.

The MemoryIndex type holds all offset information and metadata in memory, which can make it fast to access but problematic if several large libraries are open at once. The index is not saved, requiring a potentially costly scan of the entire library file when opening a file.

The SQLIndex type holds its information in a SQLite3 database on disk, and executes queries to bring only the required information into memory when requested. There is some I/O overhead involved in each look-up, but the index information persists between runs which can greatly improve start-up time.

class mzspeclib.index.IndexBase

Bases: Collection

A base type for spectral indices.

Retrieve information about entries’ identifiers and any associated metadata.

add(number, offset, name, analyte, attributes=None)

Add a new entry to the spectrum index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
name (str,) – A text identifier for this spectrum.
analyte (str, optional) – A text representation of the analyte for that record
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

add_cluster(number, offset, attributes=None)

Add a new entry to the cluster index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

check_names_unique()

Check that all indexed spectra have unique spectrum name parameters.

Returns:: Whether the spectrum names in the index are unique.
Return type:: bool

commit()

Commit any index state to disk, if this index supports persistence.

Has no effect on index types that do not have a persistence functionality.

classmethod exists(filename)

Check if an index file exists

Parameters:: filename (str | Path | FileIO)
Return type:: bool

classmethod from_filename(filename, library=None)

Get a file path for an index file, given the library filename.

Return type:: str or None
Parameters:: filename (str | Path | FileIO)

iter_clusters()

Iterate over cluster records

Return type:: Iterator[IndexRecordBase]

iter_spectra()

Iterate over peptide records

Return type:: Iterator[IndexRecordBase]

offset_for(record_label)

Retrieve the byte offset of a spectrum identifier

Return type:: int

offset_for_cluster(record_label)

Retrieve the byte offset of a cluster identifier

Return type:: int

record_for(record_label)

Retrieve a an index record for a spectrum identifier

Parameters:: record_label (int | str)
Return type:: IndexRecordBase

record_for_cluster(record_label)

Retrieve a an index record for a cluster identifier

Parameters:: record_label (int)
Return type:: IndexRecordBase

search(i, **kwargs)

Search for one or more spectrum records by index, slice or identifier

Parameters:: i (str | int | slice)
Return type:: IndexRecordBase | List[IndexRecordBase]

search_clusters(i=None, **kwargs)

Search for one or more cluster records by index, slice or identifier

Parameters:: i (int | slice | None)
Return type:: IndexRecordBase | List[IndexRecordBase]

class mzspeclib.index.MemoryIndex(records=None, cluster_records=None, metadata=None)

Bases: IndexBase

An in-memory data structure for holding spectrum metadata and offsets.

Parameters:

records (List[IndexRecord])
cluster_records (List[ClusterIndexRecord])
metadata (Dict[str, Any])

records

The index entries for spectra

Type:: List[IndexRecord]

cluster_records

The index entries for clusters

Type:: List[ClusterIndexRecord]

metadata

Arbitrary metadata about the library or the index

Type:: Dict[str, Any]

add(number, offset, name, analyte, attributes=None)

Add a new entry to the spectrum index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
name (str,) – A text identifier for this spectrum.
analyte (str, optional) – A text representation of the analyte for that record
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

add_cluster(number, offset, attributes=None)

Add a new entry to the spectrum index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

commit()

Commit any index state to disk, if this index supports persistence.

Has no effect on index types that do not have a persistence functionality.

classmethod from_filename(filename, library=None)

Get a file path for an index file, given the library filename.

Return type:: str or None

iter_clusters()

Iterate over cluster entries in the index.

Return type:: Iterator[IndexRecordBase]

iter_spectra(): Iterate over spectrum entries in the index.

search(i=None, **kwargs): Search for one or more spectrum records by index, slice or identifier

search_clusters(i=None, **kwargs): Search for one or more cluster records by index, slice or identifier

class mzspeclib.index.SQLIndex(filename)

Bases: IndexBase

An on-disk data structure for holding spectrum metadata and offsets.

This uses a SQLite3 database with the file extension .splindex to hold information.

Parameters:: filename (str)

session

A thread-aware database session manager

Type:: scoped_session

add(number, offset, name, analyte=None, attributes=None)

Add a new entry to the spectrum index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
name (str,) – A text identifier for this spectrum.
analyte (str, optional) – A text representation of the analyte for that record
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

add_cluster(number, offset, attributes=None)

Add a new entry to the spectrum index.

Parameters:

number (int) – A numerical identifier for this spectrum.
offset (int) – The offset in the file to reach the spectrum (in bytes if appropriate)
attributes (Dict[str, Any], optional) – A key-value pair collection of this record, currently not supported.

commit(): Persist any new entries to disk.

classmethod exists(filename)

Check if an index file exists

Parameters:: filename (str | Path | FileIO)

classmethod from_filename(filename, library=None)

Get a file path for an index file, given the library filename.

Return type:: str or None

iter_clusters()

Iterate over cluster entries in the index.

Return type:: Iterator[IndexRecordBase]

iter_spectra(): Iterate over spectrum entries in the index.

search(i, **kwargs): Search for one or more spectrum records by index, slice or identifier

search_clusters(i, **kwargs): Search for one or more cluster records by index, slice or identifier

Backends

File Format Backends

class mzspeclib.backends.TextSpectralLibrary(filename, index_type=None, read_metadata=True, create_index=True)

Bases: _PlainTextSpectralLibraryBackendBase

A reader for the plain text serialization of the mzSpecLib spectral library foramt.

This implementation may operate on a stream opened in binary mode or a file path. If using a non-seekable stream, the random access or search methods may not be supported.

Parameters:

filename (str | Path | FileIO)
read_metadata (bool)
create_index (bool)

create_index()

Populate the spectrum index

Returns:: n_spectra – The number of entries read
Return type:: int

get_cluster(cluster_number)

Retrieve a single spectrum cluster from the library.

Parameters:: cluster_number (int, optional) – The index of the cluster in the library
Return type:: SpectrumCluster

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the spectrum in the library
spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

classmethod guess_from_header(filename)

Guess if the file is of this type by inspecting the file’s header section

Parameters:: filename (str) – The path to the file to open.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.TextSpectralLibraryWriter(filename, version=None, compact_interpretations=True, **kwargs)

Bases: SpectralLibraryWriterBase

Write a spectral library to the plain text serialization of the mzSpecLib spectral library format.

Parameters:: compact_interpretations (bool)

version

The format version to write in semver-compatible notation

Type:: str

compact_interpretation

Whether to elect to write compact interpretation member sections when there is only one interpretation and only one interpretation member by inlining the interpretation member attributes into the interpretation. Both forms are valid, one is just less verbose.

Type:: bool, default True

close()

Close the library writer, performing any necessary finalization.

This is called automatically when __exit__() is called.

write_cluster(cluster)

Write out a SpectrumCluster and all of its components.

Parameters:: cluster (SpectrumCluster) – The spectrum cluster to write.

write_header(library)

Write the library header and other global metadata

Parameters:: library (SpectralLibraryBackendBase)

write_spectrum(spectrum)

Write out a Spectrum and all of its components.

Parameters:: spectrum (Spectrum) – The spectrum to write.

class mzspeclib.backends.JSONSpectralLibrary(filename, index_type=None, read_metadata=True, create_index=None)

Bases: SpectralLibraryBackendBase

A reader for the JSON serialization of the mzSpecLib spectral library foramt.

Note

Unlike other formats readers, this type does not parse incrementally, it instead parses the entire JSON document in-memory and stores the parsed object structure. The JSON objects are then converted into mzspeclib types upon request. This is because incremental JSON parsing is substantially more difficult to do in a byte aware manner, not to mention slow, in Python.

This may lead to large memory overhead when reading large libraries in JSON format.

create_index()

Populate the spectrum index.

This method may produce a large amount of file I/O.

Returns:: n_spectra – The number of entries read
Return type:: int

get_cluster(cluster_number)

Retrieve a single spectrum cluster from the library.

Parameters:: cluster_number (int, optional) – The index of the cluster in the library
Return type:: SpectrumCluster

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the spectrum in the library
spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

classmethod guess_from_filename(filename)

Guess if the file is of this type by inspecting the file’s name and extension.

Parameters:: filename (str) – The path to the file to inspect.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

read()

Create an sequential iterator over the spectrum library.

Yields:: entry (Union[Spectrum, SpectrumCluster])

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.JSONSpectralLibraryWriter(filename, version=None, pretty_print=True, format_annotations=True, simplify=True, **kwargs)

Bases: SpectralLibraryWriterBase

Write a spectral library to the JSON serialization of the mzSpecLib spectral library foramt.

Note

Unlike other format writers, this writer buffers the entire library in memory as JSON-compatible Python objects until the entire library is ready to be written out. This is because incrementally writing JSON is substantially more difficult to do correctly.

This may lead to large memory overhead when writing large libraries in JSON format.

close()

Close the library writer, performing any necessary finalization.

This is called automatically when __exit__() is called.

write_cluster(cluster)

Write out a SpectrumCluster and all of its components.

Parameters:: cluster (SpectrumCluster) – The spectrum cluster to write.

write_header(library)

Write the library header and other global metadata

Parameters:: library (SpectralLibraryBackendBase)

write_library(library)

Write out the entire library.

Parameters:: library (SpectralLibraryBackendBase or SpectrumLibrary) – The library to write out.
Raises:: ValueError – If the writer has already started writing one library, an error will be raised.

write_spectrum(spectrum)

Write out a Spectrum and all of its components.

Parameters:: spectrum (Spectrum) – The spectrum to write.

class mzspeclib.backends.MSPSpectralLibrary(filename, index_type=None, read_metadata=True, create_index=True)

Bases: _PlainTextSpectralLibraryBackendBase

A reader for the plain text NIST MSP spectral library format.

The MSP format is only roughly defined, and does places few constraints on the meanings of spectrum attributes. This parser attempts to cover a variety of different ways that MSPs found “in the wild” have denoted different spectrum properties, but is neither exhaustive nor nuanced enough to know from context exactly what those files’ authors intended, making a best guess at when they correspond to in the controlled vocabulary mapping for mzspeclib

Parameters:: create_index (bool)

modification_parser

A parser for peptide modifications

Type:: ModificationParser

unknown_attributes

A tracker for unknown attributes. Used to tell how much information the reader is unable to map onto the controlled vocabulary.

Type:: _UnknownTermTracker

create_index()

Populate the spectrum index

Returns:: n_spectra – The number of entries read
Return type:: int

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the spectrum in the library
spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

classmethod guess_from_header(filename)

Guess if the file is of this type by inspecting the file’s header section

Parameters:: filename (str) – The path to the file to open.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

read_header()

Read just the header of the whole library

Return type:: bool

summarize_parsing_errors()

Retrieve a free-form description of parsing errors

Return type:: Dict

class mzspeclib.backends.MSPSpectralLibraryWriter(filename, **kwargs)

Bases: SpectralLibraryWriterBase

close()

Close the library writer, performing any necessary finalization.

This is called automatically when __exit__() is called.

write_spectrum(spectrum)

Write out a Spectrum and all of its components.

Parameters:: spectrum (Spectrum) – The spectrum to write.

class mzspeclib.backends.BibliospecSpectralLibrary(filename, **kwargs)

Bases: BibliospecBase, SpectralLibraryBackendBase

Read BiblioSpec 2 SQLite3 spectral library files.

get_spectrum(spectrum_number=None, spectrum_name=None)

Read a spectrum from the spectrum library.

Bibliospec does not support alternative labeling of spectra with a plain text name so looking up by spectrum_name is not supported.

Parameters:

spectrum_number (int)
spectrum_name (str)

classmethod has_index_preference(filename)

Check if this backend prefers a particular index for this file.

The base implementation checks to see if there is a SQL index for the filename provided, and if so, prefers SQLIndex. Otherwise, prefers MemoryIndex.

Parameters:: filename (str) – The name of the file to open.
Returns:: index_type – Returns a IndexBase derived type which this backend would prefer to use.
Return type:: type

read()

Create an sequential iterator over the spectrum library.

Yields:: entry (Union[Spectrum, SpectrumCluster])
Return type:: Iterator[Spectrum]

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.SPTXTSpectralLibrary(filename, index_type=None, read_metadata=True, create_index=True)

Bases: MSPSpectralLibrary

Parameters:: create_index (bool)

classmethod guess_from_header(filename)

Guess if the file is of this type by inspecting the file’s header section

Parameters:: filename (str) – The path to the file to open.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

read_header()

Read just the header of the whole library

Return type:: bool

mzspeclib.backends.DiaNNTSVSpectralLibrary: alias of DIANNTSVSpectralLibrary

class mzspeclib.backends.DIANNTSVSpectralLibrary(filename, index_type=None, **kwargs)

Bases: _CSVSpectralLibraryBackendBase

Reader for DIA-NN TSV spectral libraries.

Parameters:: filename (str)

create_index()

Populate the spectrum index.

This method may produce a large amount of file I/O.

Returns:: n_spectra – The number of entries read
Return type:: int

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.SpectronautTSVSpectralLibrary(filename, index_type=None, **kwargs)

Bases: _CSVSpectralLibraryBackendBase

Read Spectronaut TSV spectral libraries.

Parameters:: filename (str)

create_index()

Populate the spectrum index.

This method may produce a large amount of file I/O.

Returns:: n_spectra – The number of entries read
Return type:: int

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.EncyclopediaSpectralLibrary(filename, **kwargs)

Bases: SpectralLibraryBackendBase

Read EncyclopeDIA SQLite3 spectral library files.

Parameters:: filename (str)

get_spectrum(spectrum_number=None, spectrum_name=None)

Read a spectrum from the spectrum library.

EncyclopeDIA does not support alternative labeling of spectra with a plain text name so looking up by spectrum_name is not supported.

Parameters:

spectrum_number (int)
spectrum_name (str)

classmethod has_index_preference(filename)

Check if this backend prefers a particular index for this file.

The base implementation checks to see if there is a SQL index for the filename provided, and if so, prefers SQLIndex. Otherwise, prefers MemoryIndex.

Parameters:: filename (str) – The name of the file to open.
Returns:: index_type – Returns a IndexBase derived type which this backend would prefer to use.
Return type:: type

read()

Create an sequential iterator over the spectrum library.

Yields:: entry (Union[Spectrum, SpectrumCluster])
Return type:: Iterator[Spectrum]

read_header()

Read just the header of the whole library

Return type:: bool

class mzspeclib.backends.InMemorySpectrumLibrary(spectra=None, clusters=None, create_index=True, **kwargs)

Bases: SpectralLibraryBackendBase

An in-memory spectrum library backend that holds all its entries in memory.

This can be used when generating a library in-silico.

Parameters:

spectra (List[Spectrum])
clusters (List[SpectrumCluster])
create_index (bool)

append(spectrum)

Append a Spectrum to the in-memory buffer

Parameters:: spectrum (Spectrum)

append_cluster(cluster)

Append a SpectrumCluster to the in-memory buffer

Parameters:: cluster (SpectrumCluster)

create_index()

Populate the spectrum index.

This method may produce a large amount of file I/O.

Returns:: n_spectra – The number of entries read
Return type:: int

get_cluster(cluster_number)

Retrieve a single spectrum cluster from the library.

Parameters:: cluster_number (int, optional) – The index of the cluster in the library
Return type:: SpectrumCluster

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the spectrum in the library
spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

classmethod guess_from_filename(filename)

Guess if the file is of this type by inspecting the file’s name and extension.

Parameters:: filename (str) – The path to the file to inspect.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

read()

Create an sequential iterator over the spectrum library.

Yields:: entry (Union[Spectrum, SpectrumCluster])
Return type:: Iterator[Spectrum | SpectrumCluster]

read_header()

Read just the header of the whole library

Return type:: bool

mzspeclib.backends.guess_implementation(filename, index_type=None, **kwargs)

Guess the backend implementation to use with this file format.

Parameters:

filename (str) – The path to the spectral library file to open.
index_type (type, optional) – The IndexBase derived type to use for this file. If None is provided, the instance will decide based upon has_index_preference().
**kwargs – Passed to implementation

Return type:

SpectralLibraryBackendBase

class mzspeclib.backends.SpectralLibraryBackendBase(filename)

Bases: AttributedEntity, _VocabularyResolverMixin, _LibraryViewMixin

A base class for all spectral library formats.

Parameters:: filename (str | Path | FileIO)

create_index()

Populate the spectrum index.

This method may produce a large amount of file I/O.

Returns:: n_spectra – The number of entries read
Return type:: int

get_cluster(cluster_number)

Retrieve a single spectrum cluster from the library.

Parameters:: cluster_number (int, optional) – The index of the cluster in the library
Return type:: SpectrumCluster

get_spectrum(spectrum_number=None, spectrum_name=None)

Retrieve a single spectrum from the library.

Parameters:

spectrum_number (int, optional) – The index of the spectrum in the library
spectrum_name (str, optional) – The name of the spectrum in the library

Return type:

Spectrum

classmethod guess_from_filename(filename)

Guess if the file is of this type by inspecting the file’s name and extension.

Parameters:: filename (str) – The path to the file to inspect.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

classmethod guess_from_header(filename)

Guess if the file is of this type by inspecting the file’s header section

Parameters:: filename (str) – The path to the file to open.
Returns:: Whether this is an appropriate backend for that file.
Return type:: bool

classmethod guess_implementation(filename, index_type=None, **kwargs)

Guess the backend implementation to use with this file format.

Parameters:

filename (str) – The path to the spectral library file to open.
index_type (type, optional) – The IndexBase derived type to use for this file. If None is provided, the instance will decide based upon has_index_preference().
**kwargs – Passed to implementation

Return type:

SpectralLibraryBackendBase

classmethod has_index_preference(filename)

Check if this backend prefers a particular index for this file.

The base implementation checks to see if there is a SQL index for the filename provided, and if so, prefers SQLIndex. Otherwise, prefers MemoryIndex.

Parameters:: filename (str) – The name of the file to open.
Returns:: index_type – Returns a IndexBase derived type which this backend would prefer to use.
Return type:: type

read()

Create an sequential iterator over the spectrum library.

Yields:: entry (Union[Spectrum, SpectrumCluster])
Return type:: Iterator[Spectrum | SpectrumCluster]

read_header()

Read just the header of the whole library

Return type:: bool

summarize_parsing_errors()

Retrieve a free-form description of parsing errors

Return type:: Dict

class mzspeclib.backends.SpectralLibraryWriterBase(filename, **kwargs)

Bases: _VocabularyResolverMixin

A base type for spectral library writers.

This type implements the context manager protocol, controlling the closing of the enclosed IO stream.

filename

Type:: str, pathlib.Path, or io.IOBase

close()

Close the library writer, performing any necessary finalization.

This is called automatically when __exit__() is called.

write_cluster(cluster)

Write out a SpectrumCluster and all of its components.

Parameters:: cluster (SpectrumCluster) – The spectrum cluster to write.

write_library(library)

Write out the entire library.

Parameters:: library (SpectralLibraryBackendBase or SpectrumLibrary) – The library to write out.
Raises:: ValueError – If the writer has already started writing one library, an error will be raised.

write_spectrum(spectrum)

Write out a Spectrum and all of its components.

Parameters:: spectrum (Spectrum) – The spectrum to write.

exception mzspeclib.backends.FormatInferenceFailure

Bases: ValueError

Indicates that we failed to infer the format type for a spectral library.

class mzspeclib.backends.AttributeSetTypes(value)

Bases: Enum

Attribute set type tags used as keys and section constants

class mzspeclib.backends.LibraryIterator(backend)

Bases: AttributedEntity, _LibraryViewMixin, Iterator[Spectrum]

An iterator wrapper for a library source that doesn’t permit random access

Parameters:: backend (SpectralLibraryBackendBase)

Validation

The components of the semantic validator for MzSpecLib

To obtain the base validator, use load_default_validator(), and add successive validation rule sets using get_validator_for() and add them to the base rules using the chain() or the |= operator.

from mzspeclib.validate import *

chain = load_default_validator()
chain.validate_library(library)

by_level: DefaultDict[RequirementLevel, List[ValidationError]] = DefaultDict(list)
for message in chain.error_log:
    by_level[message.requirement_level].append(message)

for level, bucket in sorted(by_level.items()):
    log_level = logging.WARN
    if level == RequirementLevel.may:
        log_level = logging.DEBUG
    logger.log(log_level, f"Found {len(bucket)} violations for {level.name.upper()} rules")
    for err in bucket:
        logger.log(log_level, f"... {err.message}")

class mzspeclib.validate.CombinationLogic(value): Bases: Enum

class mzspeclib.validate.RequirementLevel(value): Bases: IntEnum

class mzspeclib.validate.RuleSet(name, rules)

Bases: Sequence[ScopedSemanticRule]

Parameters:

name (str)
rules (List[ScopedSemanticRule])

class mzspeclib.validate.ScopedSemanticRule(id, path, attributes, requirement_level, combination_logic, condition=None, notes=None)

Bases: object

A semantic attribute rule that applies to a specific scope or context in a spectral library.

Parameters:

id (str)
path (str)
attributes (List[AttributeSemanticRule])
requirement_level (RequirementLevel)
combination_logic (CombinationLogic)
condition (AttributeSemanticRule | None)
notes (str | None)

id

A unique identifier for this rule

Type:: str

path

The validation path this rule applies to, e.g. /Library or /Library/Spectrum/Analyte

Type:: str

requirement_level

How strong the requirement this rule be obeyed is

Type:: RequirementLevel

combination_logic

How this rule’s attribute rules interact

Type:: CombinationLogic

attributes

The attribute rules applied by this semantic rule

Type:: list[AttributeSemanticRule]

condition

A pre-condition rule that must be met for this rule to be applied

Type:: AttributeSemanticRule, optional

notes

Human-readable description of what this rule enforces or clarifies its intent

Type:: str, optional

class mzspeclib.validate.AttributeSemanticPredicate

Bases: object

A predicate rule that applies to a Attribute value.

This is a base class with no specific validation behavior of its own. See its children for specific kinds of predicates.

class mzspeclib.validate.AttributeSemanticRule(accession, name, repeatable, allow_children, value=None, condition=None, notes=None, default_unit=None)

Bases: object

A semantic validation rule that enforces constraints or requirements on a specific controlled vocabulary term-defined Attribute in an entity.

This may include enforcing a AttributeSemanticPredicate on the value of the attribute, controlling how often an attribute may be used, or what units are assumed for it.

Parameters:

accession (str)
name (str)
repeatable (bool)
allow_children (bool)
value (AttributeSemanticPredicate | None)
condition (AttributeSemanticRule | None)
notes (str | None)
default_unit (str | None)

class mzspeclib.validate.LibraryFormatVersionFirstRule(requirement_level=RequirementLevel.must)

Bases: ScopedObjectRuleBase

Parameters:: requirement_level (RequirementLevel)

class mzspeclib.validate.SpectrumPeakAnnotationRule(requirement_level=RequirementLevel.should)

Bases: ScopedObjectRuleBase

Parameters:: requirement_level (RequirementLevel)

class mzspeclib.validate.ScopedObjectRuleBase(id, path, requirement_level=RequirementLevel.should)

Bases: object

A validation rule that cannot be expressed in terms of a semantic attribute constraint.

Parameters:

id (str)
path (str)
requirement_level (RequirementLevel)

id

A unique identifier for this rule

Type:: str

path

The validation path this rule applies to, e.g. /Library or /Library/Spectrum/Analyte

Type:: str

requirement_level

How strong the requirement this rule be obeyed is

Type:: RequirementLevel

class mzspeclib.validate.ValidationError(path: str, identifier_path: Tuple, attribute: Any, value: Any, requirement_level: mzspeclib.validate.level.RequirementLevel, message: str, source: str = None)

Bases: object

Parameters:

path (str)
identifier_path (Tuple)
attribute (Any)
value (Any)
requirement_level (RequirementLevel)
message (str)
source (str)

exception mzspeclib.validate.ValidationWarning

Bases: UserWarning

Indicates that something was parsed that did not halt the parser but which violates the expectations of the parser.

The parser will make a best-effort attempt to interpret the value correctly but when validating this will count as a violation.

class mzspeclib.validate.ControlledVocabularyAttributeValidator(error_log=None, current_context=None, *args, **kwargs)

Bases: ValidatorBase

Parameters:

error_log (List)
current_context (ValidationContext)

check_attributes(obj, path, identifier_path)

Stub implementation for any attribute rule checking

Parameters:

obj (AttributeManager | AttributedEntity)
path (str)
identifier_path (Tuple)

Return type:

bool

class mzspeclib.validate.ValidatorBase(error_log=None, current_context=None, *args, **kwargs)

Bases: _VocabularyResolverMixin

Parameters:

error_log (List)
current_context (ValidationContext)

chain(validator)

Combine this validator with another validator, applying both rulesets.