.. _usersGuide_11_corpusSearching:

.. WARNING: DO NOT EDIT THIS FILE:
   AUTOMATICALLY GENERATED.
   PLEASE EDIT THE .py FILE DIRECTLY.


User's Guide, Chapter 11: Corpus Searching
==========================================

One of music21's important features is its capability to help users
examine large bodies of musical works, or *corpora*.

Music21 comes with a substantial corpus called the *core* corpus. When
you download music21 you can immediately start working with the files in
the corpus directory, including the complete chorales of Bach, many
Haydn and Beethoven string quartets, three books of madrigals by
Monteverdi, thousands of folk songs from the Essen and various ABC
databases, and many more.

To load a file from the corpus, simply call *corpus.parse* and assign
that file to a variable:

.. code:: python

    from music21 import *
    bach = corpus.parse('bach/bwv66.6')

The ``music21`` local corpus comes with many thousands of works. All of
them (or at least all the collections) are listed on the
:ref:`Corpus Reference <referenceCorpus>`.

Users can also build their own corpora to index and quickly search their
own collections on disk including multiple local corpora, for different
projects, that can be accessed individually.

This user's guide will cover more about the corpus's basic features
soon. This chapter focuses on music21's tools for extracting useful
metadata - titles, locations, composers names, the key signatures used
in each piece, total durations, ambitus (range) and so forth.

This metadata is collected in *metadata bundles* for each corpus. The
*corpus* module has tools to search these bundles and persist them disk
for later research.

Types of corpora
----------------

Music21 works with three categories of *corpora*, made explicit via the
``corpus.Corpus`` abstract class.

The first category is the *core* corpus, a large collection of musical
works packaged with most music21 installations, including many works
from the common practice era, and inumerable folk songs, in a variety of
formats:

.. code:: python

    coreCorpus = corpus.corpora.CoreCorpus()
    len(coreCorpus.getPaths())




.. parsed-literal::
   :class: ipython-result

    2569



..  note::

    If you've installed a "no corpus" version of music21, you can still access
    the *core* corpus with a little work.  Download the *core* corpus from
    music21's website, and install it on your system somewhere. Then, teach
    music21 where you installed it like this:    

    >>> coreCorpus = corpus.corpora.CoreCorpus()
    >>> #_DOCS_SHOW coreCorpus.manualCoreCorpusPath = 'path/to/core/corpus'

Music21 also has the notion of a *virtual* corpus: a collection of
musical works to be found at various locations online which, for reasons
of licensing, haven't been included in the *core* corpus. There are not
too many files in there, but it is something we hope to expand. Here's
one such path:

.. code:: python

    virtualCorpus = corpus.corpora.VirtualCorpus()
    virtualCorpus.getPaths()[0]




.. parsed-literal::
   :class: ipython-result

    'http://kern.ccarh.org/cgi-bin/ksdata?l=cc/bach/cello&file=bwv1007-01.krn&f=xml'



Finally, music21 allows for *local* corpora: bodies of works provided
and configured by individual music21 users for their own research.
*Local* corpora behave identically to the *core* and *virtual* corpora,
and can be searched and cached in the same manner:

.. code:: python

    localCorpus = corpus.corpora.LocalCorpus()

You can add and remove paths from a *local* corpus with the
``addPath()`` and ``removePath()`` methods:

.. code:: python

    localCorpus.addPath('~/Desktop')
    #_DOCS_SHOW localCorpus.directoryPaths
    ('/Users/josiah/Desktop',) #_DOCS_HIDE




.. parsed-literal::
   :class: ipython-result

    ('/Users/josiah/Desktop',)



.. code:: python

    localCorpus.removePath('~/Desktop')

By default, a call to ``corpus.parse`` or ``corpus.search`` will look
for files in any corpus, core, local, or virtual.

Simple searches of the corpus
-----------------------------

When you search the corpus, music21 examines each metadata object in the
metadata bundle for the whole corpus and attempts to match your search
string against the contents of the various search fields saved in that
metadata object.

You can use ``corpus.search()`` to search the metadata associated with
all known corpora, *core*, *virtual* and even each *local* corpus:

.. code:: python

    sixEight = corpus.search('6/8')
    sixEight




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {2162 entries}>



To work with all those pieces, you can parse treat the MetadataBundle
like a list and call ``.parse()`` on any element:

.. code:: python

    myPiece = sixEight[0].parse()
    myPiece.metadata.title




.. parsed-literal::
   :class: ipython-result

    'Lango Lee.'



This will return a ``music21.stream.Score`` object which you can work
with like any other stream. Or if you just want to see it, there's a
convenience ``.show()`` method you can call directly on a MetadataEntry.

You can also search against a single ``Corpus`` instance, like this one
which ignores anything in your local corpus:

.. code:: python

    corpus.corpora.CoreCorpus().search('6/8')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {2162 entries}>



Because the result of every metadata search is also a metadata bundle,
you can search your search results to do more complex searches. Remember
that ``bachBundle`` is a collection of all works where the composer is
Bach. Here we will limit to those pieces in 3/4 time:

.. code:: python

    bachBundle = corpus.search('bach', 'composer')
    bachBundle




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {21 entries}>



.. code:: python

    bachBundle.search('3/4')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {4 entries}>



..  note::

    There are actually many more pieces by Bach in the music21 corpus,
    but many of them are without the metadata specifying him as a
    composer; his name is only in the filename. To get all the pieces
    by Bach use:
    
    >>> allBach = corpus.search('bach')
        
    This will search filenames as well.  We will aim to get more complete
    metadata in the core corpus in the near future, and would appreciate
    community help to achieve this goal.

Metadata search fields
----------------------

When you search metadata bundles, you can search either through every
search field in every metadata instance, or through a single, specific
search field. As we mentioned above, searching for "bach" as a composer
renders different results from searching for the word "bach" in general:

.. code:: python

    corpus.search('bach', 'composer')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {21 entries}>



.. code:: python

    corpus.search('bach', 'title')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {20 entries}>



.. code:: python

    corpus.search('bach')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {150 entries}>



So what fields can we actually search through? You can find out like
this:

.. code:: python

    for field in corpus.corpora.Corpus.listSearchFields():
        print(field)


.. parsed-literal::
   :class: ipython-result

    alternativeTitle
    ambitus
    composer
    date
    keySignatureFirst
    keySignatures
    localeOfComposition
    movementName
    movementNumber
    noteCount
    number
    opusNumber
    pitchHighest
    pitchLowest
    quarterLength
    tempoFirst
    tempos
    timeSignatureFirst
    timeSignatures
    title


This field will grow in the future now that the development team is
seeing how useful this searching method can be! Now that we know what
all the search fields are, we can search through some of the more
obscure corners of the *core* corpus:

.. code:: python

    corpus.search('taiwan', 'locale')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {27 entries}>



What if you are not searching for an exact match? If you're searching
for short pieces, you probably don't want to find pieces with exactly 1
note then union that set with pieces with exactly 2 notes, etc. Or for
pieces from the 19th century, you won't want to search for 1801, 1802,
etc. What you can do is set up a "predicate callable" which is a
function (either a full python ``def`` statement or a short ``lambda``
function) to filter the results. Each piece will be checked against your
predicate and only those that return true. Here we'll search for pieces
with between 400 and 500 notes, only in the ``core`` corpus:

.. code:: python

    predicate = lambda x: 400 < x < 500
    corpus.corpora.CoreCorpus().search(predicate, 'noteCount')




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {49 entries}>



You can also pass in compiled regular expressions into the search. In
this case we will use a regular expression likely to find Handel and
Haydn and perhaps not much else:

.. code:: python

    import re
    haydnOrHandel = re.compile(r'ha.d.*', re.IGNORECASE)
    corpus.search(haydnOrHandel)




.. parsed-literal::
   :class: ipython-result

    <music21.metadata.bundles.MetadataBundle {176 entries}>



Unfortunately this really wasn't a good search, since we mostly got folk
songs with the title of "Shandy". Best to use a '\*^\*' search to match
at the beginning of the word next time.

We've now gone fairly high level in our searching. We will return to the
lowest level in
:ref:`Chapter 12: The Music21Object <usersGuide_12_music21Object>`