Fg-selective-brazilian.bin <2027>
sentence = Sentence("O presidente Lula participou da reunião do G20 em Brasília.")
Platforms like TikTok BR or Twitter/X need to detect hate speech or "fake news" in comments. The selective gate naturally ignores common words like "kkkk" or "aff" while focusing on potentially toxic content. Combined with the binary format’s fast loading, you can spin up a new moderation instance in <50ms.
The non-skipped tokens then flow through a 2-layer BiLSTM (bidirectional long short-term memory) instead of a full transformer. This choice is deliberate: Brazilian Portuguese, while complex, benefits from linear sequence modeling for POS tagging and NER, and the BiLSTM is much faster for inference. A final CRF (Conditional Random Field) layer ensures label consistency.
The entire model is saved as a single binary file using pickle with protocol 5, enabling fast mmap loading. fg-selective-brazilian.bin
## Model Weights: fg-selective-brazilian.binAfter embedding a sentence (e.g., "O gato preto correu rapidamente"), each token passes through a linear gate. The gate outputs a probability between 0 and 1. If the probability is below a threshold (typically 0.3), that token’s embedding is replaced with a learnable [SKIP] vector. The gating function is trained via a combination of:
for entity in sentence.get_spans('ner'): print(f"Entity entity.tag: entity.text")
Expected output:
Entity PER: Lula
Entity LOC: G20
Entity LOC: Brasília
Notice how common words like "O", "da", "em" were skipped silently—they never passed through the heavy BiLSTM layers.
The model emerges from the intersection of two influential NLP libraries: Flair (developed by Zalando Research) and spaCy’s v3 custom model training pipeline. The selective component is inspired by the 2021 paper "Selective Token Generation for Efficient NLP" (Liu et al.), which proposed that up to 40% of tokens in a standard Portuguese news article can be skipped without harming entity recognition or part-of-speech (POS) tagging accuracy. Expected output: Entity PER: Lula Entity LOC: G20
The brazilian dataset was compiled from multiple sources:
Training involved masking selective tokens based on a lightweight predictor—a small binary classifier attached to the embedding layer. Tokens predicted as "low-information" (e.g., prepositions "de, para, com" or conjunctions "e, ou, mas") are assigned a null vector, bypassing the middle transformer layers. This reduces FLOPs by roughly 30% while maintaining >98% of the full model’s F1 score on standard benchmarks like the LeNER-Br (legal named entity recognition) and the MiniHateBR (hate speech detection).
