embeddings package¶
embeddings.embedding module¶
-
class
embeddings.embedding.Embedding[source]¶ Bases:
object-
static
download_file(url, local_filename)[source]¶ Downloads a file from an url to a local file.
Parameters: Returns: file name of the downloaded file.
Return type:
-
static
ensure_file(name, url=None, force=False, logger=<RootLogger root (WARNING)>, postprocess=None)[source]¶ Ensures that the file requested exists in the cache, downloading it if it does not exist.
Parameters: - name (str) – name of the file.
- url (str) – url to download the file from, if it doesn’t exist.
- force (bool) – whether to force the download, regardless of the existence of the file.
- logger (logging.Logger) – logger to log results.
- postprocess (function) – a function that, if given, will be applied after the file is downloaded. The function has the signature
f(fname)
Returns: file name of the downloaded file.
Return type:
-
static
initialize_db(fname)[source]¶ Parameters: fname (str) – location of the database. Returns: a SQLite3 database with an embeddings table. Return type: db (sqlite3.Connection)
-
insert_batch(batch)[source]¶ Parameters: batch (list) – a list of embeddings to insert, each of which is a tuple (word, embeddings).Example:
e = Embedding() e.db = e.initialize_db(self.e.path('mydb.db')) e.insert_batch([ ('hello', [1, 2, 3]), ('world', [2, 3, 4]), ('!', [3, 4, 5]), ])
-
static
embeddings.fasttext module¶
-
class
embeddings.fasttext.FastTextEmbedding(lang='en', show_progress=True, default='none')[source]¶ Bases:
embeddings.embedding.EmbeddingReference: https://arxiv.org/abs/1607.04606
-
__init__(lang='en', show_progress=True, default='none')[source]¶ Parameters: Note
Default can use zeros, return
None, or generate random between[-0.1, 0.1].
-
d_emb= 300¶
-
sizes= {'en': 1}¶
-
url= 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.{}.zip'¶
-
embeddings.glove module¶
-
class
embeddings.glove.GloveEmbedding(name='common_crawl_840', d_emb=300, show_progress=True, default='none')[source]¶ Bases:
embeddings.embedding.EmbeddingReference: http://nlp.stanford.edu/projects/glove
-
class
GloveSetting(url, d_embs, size, description)¶ Bases:
tuple-
d_embs¶ Alias for field number 1
-
description¶ Alias for field number 3
-
size¶ Alias for field number 2
-
url¶ Alias for field number 0
-
-
__init__(name='common_crawl_840', d_emb=300, show_progress=True, default='none')[source]¶ Parameters: - name – name of the embedding to retrieve.
- d_emb – embedding dimensions.
- show_progress – whether to print progress.
- default – how to embed words that are out of vocabulary. Can use zeros, return
None, or generate random between[-0.1, 0.1].
-
settings= {'common_crawl_48': GloveSetting(url='http://nlp.stanford.edu/data/glove.42B.300d.zip', d_embs=[300], size=1917494, description='48B token common crawl'), 'common_crawl_840': GloveSetting(url='http://nlp.stanford.edu/data/glove.840B.300d.zip', d_embs=[300], size=2195895, description='840B token common crawl'), 'twitter': GloveSetting(url='http://nlp.stanford.edu/data/glove.twitter.27B.zip', d_embs=[25, 50, 100, 200], size=1193514, description='27B token twitter'), 'wikipedia_gigaword': GloveSetting(url='http://nlp.stanford.edu/data/glove.6B.zip', d_embs=[50, 100, 200, 300], size=400000, description='6B token wikipedia 2014 + gigaword 5')}¶
-
class
embeddings.kazuma module¶
-
class
embeddings.kazuma.KazumaCharEmbedding(show_progress=True)[source]¶ Bases:
embeddings.embedding.EmbeddingReference: https://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/
-
d_emb= 100¶
-
size= 874474¶
-
url= 'https://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/jmt_pre-trained_embeddings.tar.gz'¶
-