from math import log
log2=lambda x:log(x)/log(2)
total=len(l)
counts={}
for item in l:
counts.setdefault(item,0)
counts[item]+=1
ent=0
for i in counts:
p=float(counts[i])/total
ent-=p*log2(p)
return ent
Take a list and return it’s entropy.
Take a list and return it’s entropy.
I couldn’t find something like this at the time, so fashioned this nifty function to print all combinations without repetition in a list. for example:
if the input is an array that looks like : {“0″,”1″,”2″,”3″}
It will print out the following:
01
02
03
12
13
23
The typical way a noob programmer would do with would be the cross product of the duplicate of this array, or a full loop within a loop. While this is still of complexity O(n^2) in the long run it still save some cpu cycles in the short run. Imagine trying to compare the elements of a 1000 member array. for each element compare with each element, that’s 1000*1000 iterations. The naive approach is doing 1,000,000 iterations while the above approach does (1000/2)*999 iterations or 499,000 iterations. ie. 1000 choose 2.
//NYT_query("json","javascript","[article search key]",false);
// with the 4th parameter set to false , it will only print out the total numbser of articles matching your query.
NYT_query("json","jay-z","[article search key]",true,80);
NYT_query("json","eminem","[article search key]",true,80);
//NYT_query("json","javascript","[article search key]",true,10);
//with the 4th parameter set to true, it will print out the total number and also, write each article to file.
//the txt files will be in the current working directory under a directory named {your query} in this case it’s javascript
//some will return zero kb. im not sure what the hell is going on with that(nyt server error status 500), but you can get most articles this way.
//the 5th paramter is to specify the maximum number of articles to write to file. This is pretty helpful when queries like "computer" have over 120000 results
//in the future i’ll probably refine the search paramters a bit to get more accurate results such as the ability to search for all articles containing "computer" in the "technology" section of the NYTimes
function NYT_query($format,$query,$apikey,$writetoFile,$max){
$URL = "http://api.nytimes.com/svc/search/v1/article?format=$format&query=$query&api-key=$apikey";
$html = file_get_html($URL);
$result= json_decode($html);
$arr = $result->results;
$total = $result->total;
echo $total."\n\n";
if($writetoFile){
if(isset($max)){
if($max>$total){
$max=$total;
}
$maximum = floor($max/10);
}else{
$maximum= floor($total/10);
}
for($i=0;$i<$maximum;$i++){
$url= "http://api.nytimes.com/svc/search/v1/article?format=$format&query=$query&offset=$i&api-key=$apikey";
$html = file_get_html($url);
$result= json_decode($html);
$array = $result->results;
foreach($array as $t){
$title = urldecode($t->title);
$url = $t->url;
$artBody = html_entity_decode(extractArticle($url));
if(is_dir("archive/".$query)){
writeToFile("archive/".$query."/".$title.".txt",$artBody);
}else{
mkdir("archive/".$query);
writeToFile("archive/".$query."/".$title.".txt",$artBody);
}
}
}
}
}
function extractArticle($url){
$html = file_get_html($url);
$body="";
foreach($html->find(‘.articleBody’) as $element){
$body.= $element->plaintext."\n";
}
$html->clear();
unset($html);
return $body;
}
function writeToFile($filename,$body){
$myFile = $filename;
$fh = fopen($myFile, ‘w’) or die("can’t open file");
$stringData = $body;
fwrite($fh, $stringData);
fclose($fh);
}
very nice little script I made September of 2010. This makes it easy to query the nytimes api to find relative news articles in bulk.
set the artist_name to replace_chars(artist_name, " ", "+")
set the track_name to replace_chars(track_name, " ", "+")
tell application "Safari"
activate
set the URL of the front document to ¬
"http://search.lyrics.astraweb.com/?word=" & artist_name & "+" & track_name & ""
end tell
on replace_chars(this_text, search_string, replacement_string)
set AppleScript‘s text item delimiters to the search_string
set the item_list to every text item of this_text
set AppleScript‘s text item delimiters to the replacement_string
set this_text to the item_list as string
set AppleScript‘s text item delimiters to ""
return this_text
end replace_chars
Another applescript I fashioned a while back along with the guitar tabs finder. It’s been relatively time-saving. no more clicking or typing more than a hotkey when I want to find lyrics or guitar tabs yesssss.
set the artist_name to replace_chars(artist_name, " ", "+")
set the track_name to replace_chars(track_name, " ", "+")
tell application "Safari"
activate
set the URL of the front document to ¬
"http://www.ultimate-guitar.com/search.php?bn=" & artist_name & "&sn=" & track_name & "&type%5B%5D=1&type%5B%5D=3"
end tell
on replace_chars(this_text, search_string, replacement_string)
set AppleScript‘s text item delimiters to the search_string
set the item_list to every text item of this_text
set AppleScript‘s text item delimiters to the replacement_string
set this_text to the item_list as string
set AppleScript‘s text item delimiters to ""
return this_text
end replace_chars
Cool applescript I made a while back that takes the currently playing song in itunes and searches utimate-guitar.com for guitar tabs. I set it up with Quicksilver to easily put the script on a hot-key combination for quick guitar tabs hunting.
I don’t remember when i figured this out, I just remember that I was doing a programming challenge of some sorts and didn’t want to generate prime numbers.
this is a bash one-liner for those of you that didn’t get that right that right off the bat. wget is included on most unix/linux systems by default.