A post and a prank
This commit is contained in:
parent
93ed5a42d6
commit
e1a5fe42b3
91
_posts/2019-10-05-BillWurtz.md
Normal file
91
_posts/2019-10-05-BillWurtz.md
Normal file
@ -0,0 +1,91 @@
|
||||
---
|
||||
layout: post
|
||||
title: "Using an RNN to generate Bill Wurtz notes"
|
||||
description: "Textgenrnn is fun"
|
||||
date: 2019-10-05 18:54:00
|
||||
categories: project
|
||||
redirect_from:
|
||||
- /post/99g9j2r90/
|
||||
- /99g9j2r90/
|
||||
---
|
||||
|
||||
[Bill Wurtz](https://billwurtz.com/) is an American musician who became [reasonably famous](https://socialblade.com/youtube/user/billwurtz/realtime) through short musical videos posted to Vine and YouTube. I was searching through his website the other day, and stumbled upon a page labeled [*notebook*](https://billwurtz.com/notebook.html), and thought I should check it out.
|
||||
|
||||
Bill's notebook is a large (about 580 posts) collection of random thoughts, ideas, and sometimes just collections of words. A prime source of entertainment, and neural network inputs..
|
||||
|
||||
> *"If you are looking to burn something, fire may be just the ticket"* - Bill Wurtz
|
||||
|
||||
## Choosing the right tool for the job
|
||||
If you haven't noticed yet, Im building a neural net to generate notes based on his writing style and content. Anyone who has read [my first post](/blog/2018/06/27/becomeranter) will know that I have already done a similar project in the past. This means *time to reuse come code*!
|
||||
|
||||
For this project, I decided to use an amazing library by @minimaxir called [textgenrnn](https://github.com/minimaxir/textgenrnn). This Python library will handle all of the heavy (and light) work of training an RNN on a text dataset, then generating new text.
|
||||
|
||||
## Building a dataset
|
||||
This project was a joke, so I didn't bother with properly grabbing each post, categorizing them, and parsing them. Instead, I build a little script to pull every HTML file from Bill's website, and regex out the body. This ended up leaving some artifacts in the output, but I don't really mind.
|
||||
|
||||
```python
|
||||
import re
|
||||
import requests
|
||||
|
||||
|
||||
def loadAllUrls():
|
||||
page = requests.get("https://billwurtz.com/notebook.html").text
|
||||
|
||||
links = re.findall(r"HREF=\"(.*)\"style", page)
|
||||
|
||||
return links
|
||||
|
||||
|
||||
def dumpEach(urls):
|
||||
for url in urls:
|
||||
page = requests.get(f"https://billwurtz.com/{url}").text.strip().replace(
|
||||
"</br>", "").replace("<br>", "").replace("\n", " ")
|
||||
|
||||
data = re.findall(r"</head>(.*)", page, re.MULTILINE)
|
||||
|
||||
# ensure data
|
||||
if len(data) == 0:
|
||||
continue
|
||||
|
||||
print(data[0])
|
||||
|
||||
|
||||
urls = loadAllUrls()
|
||||
print(f"Loaded {len(urls)} pages")
|
||||
dumpEach(urls)
|
||||
|
||||
```
|
||||
|
||||
This script will print each of Bill's notes to the console (on it's own line). I used a simple redirect to write this to a file.
|
||||
|
||||
```sh
|
||||
python3 scrape.py > posts.txt
|
||||
```
|
||||
|
||||
## Training
|
||||
To train the RNN, I just used some of textgenrnn's example code to read the posts file, and build an [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) file to store the RNN's neurons.
|
||||
|
||||
```python
|
||||
from textgenrnn import textgenrnn
|
||||
|
||||
generator = textgenrnn()
|
||||
generator.train_from_file("/path/to/posts.txt", num_epochs=100)
|
||||
```
|
||||
|
||||
This takes quite a while to run, so I offloaded it to a [Droplet](https://www.digitalocean.com/products/droplets/), and left it running overnight.
|
||||
|
||||
## The results
|
||||
Here are some of my favorite generated notes:
|
||||
|
||||
> *"note: do not feel better"*
|
||||
|
||||
> *"hi I am a car."*
|
||||
|
||||
> *"i am stuff and think about this before . this is it, the pond. how do they make me feel better?"*
|
||||
|
||||
> *"i am still about the floor"*
|
||||
|
||||
Not perfect, but it is readable english, so i call it a win!
|
||||
|
||||
## Play with the code
|
||||
I have uploaded the basic code, the scraped posts, and a partial hdf5 file [to GitHub](https://github.com/Ewpratten/be-bill) for anyone to play with. Maybe make a twitter bot out of this?
|
@ -204,7 +204,7 @@ sub rsa4096/0xA61A2F1676E35144 2019-08-11 [] [expires: 2025-08-09]
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -184,3 +184,22 @@ a h5 {
|
||||
top: 0;
|
||||
left: 0
|
||||
}
|
||||
|
||||
blockquote {
|
||||
background: #f9f9f9;
|
||||
border-left: 10px solid #ccc;
|
||||
margin: 1.5em 10px;
|
||||
padding: 0.5em 10px;
|
||||
/* quotes: "\201C""\201D""\2018""\2019"; */
|
||||
}
|
||||
blockquote:before {
|
||||
color: #ccc;
|
||||
/* content: open-quote; */
|
||||
font-size: 4em;
|
||||
line-height: 0.1em;
|
||||
margin-right: 0.25em;
|
||||
vertical-align: -0.4em;
|
||||
}
|
||||
blockquote p {
|
||||
display: inline;
|
||||
}
|
@ -123,7 +123,7 @@ pip3 install tensorflow-gpu #for gpu processing
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -87,7 +87,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -100,7 +100,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -111,7 +111,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -125,7 +125,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -84,7 +84,7 @@ Your browser does not support audio players
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -124,7 +124,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -82,7 +82,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -82,7 +82,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -191,7 +191,7 @@ __<span class="o">()</span> <span class="o">{</span>/???/???/???n?f <span class=
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -112,7 +112,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -177,7 +177,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -101,7 +101,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -174,7 +174,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -95,7 +95,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -189,7 +189,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -107,7 +107,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -152,7 +152,7 @@ ibus-daemon <span class="nt">-drx</span>
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -137,7 +137,7 @@ shift2 <span class="nt">-h</span>
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -108,7 +108,7 @@ Starting from the top, scroll through, and middle click on anything you want to
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -157,7 +157,7 @@ fn printMyNumber(MyClass* self){
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -64,22 +64,22 @@
|
||||
Featured Post
|
||||
</div>
|
||||
<div class="card-body">
|
||||
<h5 class="card-title">5024's first offseason event of 2019
|
||||
<h5 class="card-title">Using an RNN to generate Bill Wurtz notes
|
||||
|
||||
</h5>
|
||||
<p class="card-text">A 120lb headless chicken thad doesn't understand the word "stop"</p>
|
||||
<a href="/blog/2019/09/28/offseason1" class="btn btn-primary">View</a>
|
||||
<p class="card-text">Textgenrnn is fun</p>
|
||||
<a href="/blog/2019/10/05/billwurtz" class="btn btn-primary">View</a>
|
||||
</div>
|
||||
</div>
|
||||
</div> -->
|
||||
|
||||
<a href="/blog/2019/09/28/offseason1" class="list-group-item list-group-item-action">
|
||||
<a href="/blog/2019/10/05/billwurtz" class="list-group-item list-group-item-action">
|
||||
<div class="d-flex w-100 justify-content-between">
|
||||
<div class="card-body">
|
||||
<h5 class="mb-1">5024's first offseason event of 2019
|
||||
<h5 class="mb-1">Using an RNN to generate Bill Wurtz notes
|
||||
|
||||
</h5>
|
||||
<p class="card-text">A 120lb headless chicken thad doesn't understand the word "stop"</p>
|
||||
<p class="card-text">Textgenrnn is fun</p>
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
@ -92,6 +92,21 @@
|
||||
|
||||
|
||||
|
||||
<a href="/blog/2019/09/28/offseason1" class="list-group-item list-group-item-action">
|
||||
<div class="d-flex w-100 justify-content-between">
|
||||
<h5 class="mb-1">5024's first offseason event of 2019</h5>
|
||||
<!-- <small>2019-09-28 20:00:00 -0400</small> -->
|
||||
</div>
|
||||
<p class="card-text">A 120lb headless chicken thad doesn't understand the word "stop"</p>
|
||||
</a>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<a href="/blog/2019/09/19/i-want-to-build-a-sat" class="list-group-item list-group-item-action">
|
||||
<div class="d-flex w-100 justify-content-between">
|
||||
<h5 class="mb-1">I want to build a satellite</h5>
|
||||
@ -535,7 +550,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -52,7 +52,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
119
_site/feed.xml
119
_site/feed.xml
@ -1,4 +1,90 @@
|
||||
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.6">Jekyll</generator><link href="http://0.0.0.0:4000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://0.0.0.0:4000/" rel="alternate" type="text/html" /><updated>2019-09-30T09:35:47-04:00</updated><id>http://0.0.0.0:4000/feed.xml</id><title type="html">Evan Pratten</title><subtitle>Computer wizard, student, <a href="https://frc5024.github.io">@frc5024</a> programming team lead, and radio enthusiast.</subtitle><entry><title type="html">Building images from binary data</title><link href="http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin" rel="alternate" type="text/html" title="Building images from binary data" /><published>2019-09-11T08:41:00-04:00</published><updated>2019-09-11T08:41:00-04:00</updated><id>http://0.0.0.0:4000/blog/2019/09/11/Buildingimgfrombin</id><content type="html" xml:base="http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin"><p>During a computer science class today, we were talking about embedding code and metadata in <em>jpg</em> and <em>bmp</em> files. @exvacuum was showing off a program he wrote that watched a directory for new image files, and would display them on a canvas. He then showed us a special image. In this image, he had injected some metadata into the last few pixels, which were not rendered, but told his program where to position the image on the canvas, and it’s size.</p>
|
||||
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.6">Jekyll</generator><link href="http://0.0.0.0:4000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://0.0.0.0:4000/" rel="alternate" type="text/html" /><updated>2019-10-09T08:34:03-04:00</updated><id>http://0.0.0.0:4000/feed.xml</id><title type="html">Evan Pratten</title><subtitle>Computer wizard, student, <a href="https://frc5024.github.io">@frc5024</a> programming team lead, and radio enthusiast.</subtitle><entry><title type="html">Using an RNN to generate Bill Wurtz notes</title><link href="http://0.0.0.0:4000/blog/2019/10/05/billwurtz" rel="alternate" type="text/html" title="Using an RNN to generate Bill Wurtz notes" /><published>2019-10-05T14:54:00-04:00</published><updated>2019-10-05T14:54:00-04:00</updated><id>http://0.0.0.0:4000/blog/2019/10/05/BillWurtz</id><content type="html" xml:base="http://0.0.0.0:4000/blog/2019/10/05/billwurtz"><p><a href="https://billwurtz.com/">Bill Wurtz</a> is an American musician who became <a href="https://socialblade.com/youtube/user/billwurtz/realtime">reasonably famous</a> through short musical videos posted to Vine and YouTube. I was searching through his website the other day, and stumbled upon a page labeled <a href="https://billwurtz.com/notebook.html"><em>notebook</em></a>, and thought I should check it out.</p>
|
||||
|
||||
<p>Bill’s notebook is a large (about 580 posts) collection of random thoughts, ideas, and sometimes just collections of words. A prime source of entertainment, and neural network inputs..</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>“If you are looking to burn something, fire may be just the ticket”</em> - Bill Wurtz</p>
|
||||
</blockquote>
|
||||
|
||||
<h2 id="choosing-the-right-tool-for-the-job">Choosing the right tool for the job</h2>
|
||||
<p>If you haven’t noticed yet, Im building a neural net to generate notes based on his writing style and content. Anyone who has read <a href="/blog/2018/06/27/becomeranter">my first post</a> will know that I have already done a similar project in the past. This means <em>time to reuse come code</em>!</p>
|
||||
|
||||
<p>For this project, I decided to use an amazing library by @minimaxir called <a href="https://github.com/minimaxir/textgenrnn">textgenrnn</a>. This Python library will handle all of the heavy (and light) work of training an RNN on a text dataset, then generating new text.</p>
|
||||
|
||||
<h2 id="building-a-dataset">Building a dataset</h2>
|
||||
<p>This project was a joke, so I didn’t bother with properly grabbing each post, categorizing them, and parsing them. Instead, I build a little script to pull every HTML file from Bill’s website, and regex out the body. This ended up leaving some artifacts in the output, but I don’t really mind.</p>
|
||||
|
||||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">re</span>
|
||||
<span class="kn">import</span> <span class="nn">requests</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">loadAllUrls</span><span class="p">():</span>
|
||||
<span class="n">page</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"https://billwurtz.com/notebook.html"</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
|
||||
|
||||
<span class="n">links</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r"HREF=\"(.*)\"style"</span><span class="p">,</span> <span class="n">page</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">links</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">dumpEach</span><span class="p">(</span><span class="n">urls</span><span class="p">):</span>
|
||||
<span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">:</span>
|
||||
<span class="n">page</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">f</span><span class="s">"https://billwurtz.com/{url}"</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span>
|
||||
<span class="s">"&lt;/br&gt;"</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"&lt;br&gt;"</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="s">" "</span><span class="p">)</span>
|
||||
|
||||
<span class="n">data</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r"&lt;/head&gt;(.*)"</span><span class="p">,</span> <span class="n">page</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">MULTILINE</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># ensure data
|
||||
</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
|
||||
|
||||
|
||||
<span class="n">urls</span> <span class="o">=</span> <span class="n">loadAllUrls</span><span class="p">()</span>
|
||||
<span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s">"Loaded {len(urls)} pages"</span><span class="p">)</span>
|
||||
<span class="n">dumpEach</span><span class="p">(</span><span class="n">urls</span><span class="p">)</span>
|
||||
|
||||
</code></pre></div></div>
|
||||
|
||||
<p>This script will print each of Bill’s notes to the console (on it’s own line). I used a simple redirect to write this to a file.</p>
|
||||
|
||||
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 scrape.py <span class="o">&gt;</span> posts.txt
|
||||
</code></pre></div></div>
|
||||
|
||||
<h2 id="training">Training</h2>
|
||||
<p>To train the RNN, I just used some of textgenrnn’s example code to read the posts file, and build an <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">HDF5</a> file to store the RNN’s neurons.</p>
|
||||
|
||||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">textgenrnn</span> <span class="kn">import</span> <span class="n">textgenrnn</span>
|
||||
|
||||
<span class="n">generator</span> <span class="o">=</span> <span class="n">textgenrnn</span><span class="p">()</span>
|
||||
<span class="n">generator</span><span class="o">.</span><span class="n">train_from_file</span><span class="p">(</span><span class="s">"/path/to/posts.txt"</span><span class="p">,</span> <span class="n">num_epochs</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
|
||||
</code></pre></div></div>
|
||||
|
||||
<p>This takes quite a while to run, so I offloaded it to a <a href="https://www.digitalocean.com/products/droplets/">Droplet</a>, and left it running overnight.</p>
|
||||
|
||||
<h2 id="the-results">The results</h2>
|
||||
<p>Here are some of my favorite generated notes:</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>“note: do not feel better”</em></p>
|
||||
</blockquote>
|
||||
|
||||
<blockquote>
|
||||
<p><em>“hi I am a car.”</em></p>
|
||||
</blockquote>
|
||||
|
||||
<blockquote>
|
||||
<p><em>“i am stuff and think about this before . this is it, the pond. how do they make me feel better?”</em></p>
|
||||
</blockquote>
|
||||
|
||||
<blockquote>
|
||||
<p><em>“i am still about the floor”</em></p>
|
||||
</blockquote>
|
||||
|
||||
<p>Not perfect, but it is readable english, so i call it a win!</p>
|
||||
|
||||
<h2 id="play-with-the-code">Play with the code</h2>
|
||||
<p>I have uploaded the basic code, the scraped posts, and a partial hdf5 file <a href="https://github.com/Ewpratten/be-bill">to GitHub</a> for anyone to play with. Maybe make a twitter bot out of this?</p></content><author><name></name></author><summary type="html">Bill Wurtz is an American musician who became reasonably famous through short musical videos posted to Vine and YouTube. I was searching through his website the other day, and stumbled upon a page labeled notebook, and thought I should check it out.</summary></entry><entry><title type="html">Building images from binary data</title><link href="http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin" rel="alternate" type="text/html" title="Building images from binary data" /><published>2019-09-11T08:41:00-04:00</published><updated>2019-09-11T08:41:00-04:00</updated><id>http://0.0.0.0:4000/blog/2019/09/11/Buildingimgfrombin</id><content type="html" xml:base="http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin"><p>During a computer science class today, we were talking about embedding code and metadata in <em>jpg</em> and <em>bmp</em> files. @exvacuum was showing off a program he wrote that watched a directory for new image files, and would display them on a canvas. He then showed us a special image. In this image, he had injected some metadata into the last few pixels, which were not rendered, but told his program where to position the image on the canvas, and it’s size.</p>
|
||||
|
||||
<p>This demo got @hyperliskdev and I thinking about what else we can do with image data. After some talk, the idea of converting application binaries to images came up. I had seen a blog post about visually decoding <a href="https://en.wikipedia.org/wiki/On%E2%80%93off_keying">OOK data</a> by converting an <a href="http://www.ni.com/tutorial/4805/en/">IQ capture</a> to an image. With a little adaptation, I did the same for a few binaries on my laptop.</p>
|
||||
|
||||
@ -583,33 +669,4 @@ ibus-daemon <span class="nt">-drx</span>
|
||||
<h2 id="using-the-script">Using the script</h2>
|
||||
<p>This script is not on PYPI this time. You can obtain a copy from my GitHub repo: <a href="https://github.com/Ewpratten/frc-code-stats">https://github.com/Ewpratten/frc-code-stats</a></p>
|
||||
|
||||
<p>First, make sure both <code class="highlighter-rouge">python3.7</code> and <code class="highlighter-rouge">python3-pip</code> are installed on your computer. Next, delete the <code class="highlighter-rouge">data.json</code> file. Then, install the requirements with <code class="highlighter-rouge">pip3 install -r requirements.txt</code>. Finally, run with <code class="highlighter-rouge">python3 main.py</code> to start the script. Now, go outside and enjoy nature for about an hour, and your data should be loaded!.</p></content><author><name></name></author><summary type="html">I was curious about the most used languages for FRC, so I build a Python script to find out what they where.</summary></entry><entry><title type="html">devDNS</title><link href="http://0.0.0.0:4000/blog/2019/07/01/devdns" rel="alternate" type="text/html" title="devDNS" /><published>2019-07-01T18:13:00-04:00</published><updated>2019-07-01T18:13:00-04:00</updated><id>http://0.0.0.0:4000/blog/2019/07/01/devDNS</id><content type="html" xml:base="http://0.0.0.0:4000/blog/2019/07/01/devdns"><p>Over the past year and a half, I have been hacking my way around the undocumented <a href="https://devrant.com">devRant</a> auth/write API. At the request of devRant’s creators, this API must not be documented due to the way logins work on the platform. That is besides the point. I have been working on a little project called <a href="https://devrant.com/collabs/2163502">devDNS</a> over the past few days that uses this undocumented API. Why must I be so bad at writing intros?</p>
|
||||
|
||||
<h2 id="what-is-devdns">What is devDNS</h2>
|
||||
<p>devDNS is a devRant bot written in python. It will serve any valid DNS query from any user on the platform. A query is just a comment in one of the following forms:</p>
|
||||
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@devDNS example.com
|
||||
</code></pre></div></div>
|
||||
<p>or</p>
|
||||
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@devDNS MX example.com
|
||||
</code></pre></div></div>
|
||||
<p>Of course, <code class="highlighter-rouge">MX</code> and <code class="highlighter-rouge">example.com</code> are to be replaced with the domain and entry of your choosing.</p>
|
||||
|
||||
<p>devDNS was inspired by <a href="https://twitter.com/1111resolver">@1111Resolver</a>, and the source is available on <a href="https://github.com/Ewpratten/devDNS">GitHub</a>.</p>
|
||||
|
||||
<h2 id="how-it-works">How it works</h2>
|
||||
<p>The Python script behind devDNS is very simple. devDNS does the following every 10 seconds:</p>
|
||||
<ul>
|
||||
<li>Fetch all new notifs</li>
|
||||
<li>Find only mentions</li>
|
||||
<li>Spin off a thread for each mention that passes a basic parser (Is the message 2 or 3 words long)</li>
|
||||
<li>In the thread, check if the message is a control message (allows me to view the status of the bot via devRant)</li>
|
||||
<li>Check if the request matches a required pattern</li>
|
||||
<li>Call <code class="highlighter-rouge">dnspython</code> with requested record and domain</li>
|
||||
<li>Receive answer from a custom <a href="https://pi-hole.net/">PIHole</a> server with caching and super low latency</li>
|
||||
<li>Send a comment with the results to the requester</li>
|
||||
</ul>
|
||||
|
||||
<p>Thats it! Super simple, and only two days from concept to reality.</p>
|
||||
|
||||
<h2 id="where-is-this-hosted">Where is this hosted?</h2>
|
||||
<p>This program is hosted on a raspberry pi laying in my room running docker. I also have <a href="https://www.portainer.io/">Portainer</a> set up so I can easily monitor the bot from my phone over my VPN.</p></content><author><name></name></author><summary type="html">Over the past year and a half, I have been hacking my way around the undocumented devRant auth/write API. At the request of devRant’s creators, this API must not be documented due to the way logins work on the platform. That is besides the point. I have been working on a little project called devDNS over the past few days that uses this undocumented API. Why must I be so bad at writing intros?</summary></entry></feed>
|
||||
<p>First, make sure both <code class="highlighter-rouge">python3.7</code> and <code class="highlighter-rouge">python3-pip</code> are installed on your computer. Next, delete the <code class="highlighter-rouge">data.json</code> file. Then, install the requirements with <code class="highlighter-rouge">pip3 install -r requirements.txt</code>. Finally, run with <code class="highlighter-rouge">python3 main.py</code> to start the script. Now, go outside and enjoy nature for about an hour, and your data should be loaded!.</p></content><author><name></name></author><summary type="html">I was curious about the most used languages for FRC, so I build a Python script to find out what they where.</summary></entry></feed>
|
@ -88,7 +88,7 @@ https://blog.mrtnrdl.de/feed.xml
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -101,7 +101,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -256,7 +256,7 @@
|
||||
<span class="site-info">
|
||||
Site design by: <a href="https://retrylife.ca">Evan Pratten</a> |
|
||||
|
||||
This site was last updated at: 2019-09-30 09:35:47 -0400
|
||||
This site was last updated at: 2019-10-09 08:34:03 -0400
|
||||
</span>
|
||||
</div>
|
||||
|
||||
|
@ -1 +1 @@
|
||||
{"/post/ef7b3166/":"http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin","/ef7b3166/":"http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin","/post/3a588993/":"http://0.0.0.0:4000/blog/2019/09/12/dronelicense","/3a588993/":"http://0.0.0.0:4000/blog/2019/09/12/dronelicense","/post/e9gb3490/":"http://0.0.0.0:4000/blog/2019/09/19/i-want-to-build-a-sat","/e9gb3490/":"http://0.0.0.0:4000/blog/2019/09/19/i-want-to-build-a-sat","/post/e9g9d6s90/":"http://0.0.0.0:4000/blog/2019/09/28/offseason1","/e9g9d6s90/":"http://0.0.0.0:4000/blog/2019/09/28/offseason1","/r/5kcomm":"https://imgur.com/a/77bnlZN"}
|
||||
{"/post/ef7b3166/":"http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin","/ef7b3166/":"http://0.0.0.0:4000/blog/2019/09/11/buildingimgfrombin","/post/3a588993/":"http://0.0.0.0:4000/blog/2019/09/12/dronelicense","/3a588993/":"http://0.0.0.0:4000/blog/2019/09/12/dronelicense","/post/e9gb3490/":"http://0.0.0.0:4000/blog/2019/09/19/i-want-to-build-a-sat","/e9gb3490/":"http://0.0.0.0:4000/blog/2019/09/19/i-want-to-build-a-sat","/post/e9g9d6s90/":"http://0.0.0.0:4000/blog/2019/09/28/offseason1","/e9g9d6s90/":"http://0.0.0.0:4000/blog/2019/09/28/offseason1","/post/99g9j2r90/":"http://0.0.0.0:4000/blog/2019/10/05/billwurtz","/99g9j2r90/":"http://0.0.0.0:4000/blog/2019/10/05/billwurtz","/r/5kcomm":"https://imgur.com/a/77bnlZN"}
|
@ -184,3 +184,22 @@ a h5 {
|
||||
top: 0;
|
||||
left: 0
|
||||
}
|
||||
|
||||
blockquote {
|
||||
background: #f9f9f9;
|
||||
border-left: 10px solid #ccc;
|
||||
margin: 1.5em 10px;
|
||||
padding: 0.5em 10px;
|
||||
/* quotes: "\201C""\201D""\2018""\2019"; */
|
||||
}
|
||||
blockquote:before {
|
||||
color: #ccc;
|
||||
/* content: open-quote; */
|
||||
font-size: 4em;
|
||||
line-height: 0.1em;
|
||||
margin-right: 0.25em;
|
||||
vertical-align: -0.4em;
|
||||
}
|
||||
blockquote p {
|
||||
display: inline;
|
||||
}
|
Loading…
x
Reference in New Issue
Block a user